Home Joern Cheat Sheet
Post
Cancel

Joern Cheat Sheet

Joern Notes

Joern is a static analyzer that can be used to create code property graphs and query them fairly easy. This is good alternative to CodeQL since analyzing with Joern doesn’t require you to compile/build the project. I often use this for cases where I can’t use CodeQL or Snyk’s internal static analysis engine to analyse a codebase.

Install Joern (Linux)

Pre-requisites

1
apt install source-highlight graphviz unzip

Setup Joern CLI

1
2
3
4
mkdir joern && cd joern # optional
curl -L "https://github.com/joernio/joern/releases/latest/download/joern-install.sh" -o joern-install.sh
chmod u+x joern-install.sh
./joern-install.sh --interactive

Import a Project, create CPG and load to console

1
2
3
4
5
6
7
8
9
10
11
joern> importCode("crow") 
Using generator for language: NEWC: CCpgGenerator
Creating project `crow` for code at `crow`
moving cpg.bin.zip to cpg.bin because it is already a database file
Creating working copy of CPG to be safe
Loading base CPG from: /home/snoopy/joern-workshop/workspace/crow/cpg.bin.tmp
Code successfully imported. You can now query it using `cpg`.
For an overview of all imported code, type `workspace`.
Adding default overlays to base CPG
The graph has been modified. You may want to use the `save` command to persist changes to disk.  All changes will also be saved collectively on exit
res0: Cpg = io.shiftleft.codepropertygraph.generated.Cpg@29345a91

The value “crow” is a folder which contains the source code we are trying to analyze.

Joern also saves all analyzed projects, this can be viewed by typing workspace

1
2
3
4
5
6
7
joern> workspace 
res1: workspacehandling.WorkspaceManager[JoernProject] = 
____________________________________________________________________________________________
| name    | overlays                           | inputPath                          | open  |
|===========================================================================================|
| NodeBB1 |                                    | /home/snoopy/joern-workshop/NodeBB | false |
| NodeBB  | controlflow,typerel,base,callgraph | /home/snoopy/joern-workshop/NodeBB | false |

Open can be used to load already analyzed projects, ImportCPG can also be used to load already created bin files

1
2
3
4
5
6
7
8
9
10
11
joern> open("/home/snoopy/joern-workshop/NodeBB"); 
Passing paths to `loadCpg` is deprecated, please use a project name
Creating working copy of CPG to be safe
Loading base CPG from: /home/snoopy/joern-workshop/workspace/NodeBB/cpg.bin.tmp
res2: Option[workspacehandling.Project] = Some(
  value = Project(
    projectFile = ProjectFile(inputPath = "/home/snoopy/joern-workshop/NodeBB", name = "NodeBB"),
    path = /home/snoopy/joern-workshop/workspace/NodeBB,
    cpg = Some(value = io.shiftleft.codepropertygraph.generated.Cpg@762e3836)
  )
)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
joern> open("NodeBB9") 
Creating working copy of CPG to be safe
Loading base CPG from: /home/snoopy/joern-workshop/workspace/NodeBB/cpg.bin.tmp
res1: Option[workspacehandling.Project] = Some(
  value = Project(
    projectFile = ProjectFile(
      inputPath = "/home/snoopy/joern-workshop/NodeBB",
      name = "NodeBB"
    ),
    path = /home/snoopy/joern-workshop/workspace/NodeBB,
    cpg = Some(value = io.shiftleft.codepropertygraph.generated.Cpg@1c65740a)
  )
)

joern>

Searching

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
// search for any methods that has the sanitize in its name. Regex can be used here
joern> cpg.method.name(".*sanitize.*").name.l 
res4: List[String] = List("sanitizeSignature", "sanitize")

// dump the code block that matched the search
joern> cpg.method.name(".*find.*").dump 
res3: List[String] = List(
  """static Map* find_hash( ino_t ino, dev_t dev, off_t size, time_t ct ); /* <=== */ 
""",
  """static Map* /* <=== */ 
find_hash( ino_t ino, dev_t dev, off_t size, time_t ct )
    {
    unsigned int h, he, i;
    Map* m;

    h = hash( ino, dev, size, ct );
    he = ( h + hash_size - 1 ) & hash_mask;
    for ( i = h; ; i = ( i + 1 ) & hash_mask )
	{
	m = hash_table[i];
	if ( m == (Map*) 0 )
	    break;
	if ( m->hash == h && m->ino == ino && m->dev == dev &&
	     m->size == size && m->ct == ct )
	    return m;
	if ( i == he )
	    break;
	}
    return (Map*) 0;
    }

Other useful commands

  • cpg.method.name("parse_public_key_packet").local.name.l - Find all local variables defined in a method
  • cpg.method.name("parse_public_key_packet").location.map( x=> (x.lineNumber.get,x.filename)).l - Find which file and line number they are in
  • cpg.method.name("parse_public_key_packet").local.typ.name.l.head - Find the type of the first local variable defined in a method
  • cpg.method.name("parse_public_key_packet").callOut.name.l - Find all outgoing calls (call-sites) in a method

  • cpg.method.name("parse_public_key_packet").caller.name.l - Find which method calls a method
  • cpg.types.name("vlc_.*").localsOfType.name.l - List all local variables of type vlc_.*
  • cpg.types.name("vlc_log_t").map( x=> (x.name, x.start.member.name.l)).l - Find member variables of a struct cpg.local.filter(_.typ.name("vlc_log_t")).name.l - Find local variables and filter them by their type
  • cpg.local.filter(_.typ.name("vlc_log_t")).method.dump - Which method are they used in?

cpg.local.filter(_.typ.name("vlc_log_t")).method.file.name.l - Get the filenames where these methods are

  • cpg.method.where(_.parameter.size > 4).signature.l - Identify functions with more than 4 parameters
  • cpg.method.where(_.controlStructure.size > 4).name.l - Identify functions with > 4 control structures (cyclomatic complexity)
  • cpg.method.where(_.numberOfLines >= 500).name.l - Identify functions with more than 500 lines of code

  • cpg.method.where(_.ast.isReturn.l.size > 1).name.l - Identify functions with multiple return statements
  • cpg.method.where(_.ast.isControlStructure.parserTypeName("(For|Do|While).*").size >4).name.l - Identify functions with more than 4 loops
  • cpg.method.where(_.depth(_.isControlStructure) > 3).name.l - Identify functions with nesting depth larger than 3

  • cpg.method.name("find_hash").repeat(_.caller)(_.emit).name.l - Find the calle of a method
  • cpg.method.external.name.l.distinct.sorted - All names of external methods used by the program
  • cpg.call("str.*").code.l - All calls to functions that start with “str*”
  • cpg.call("strcpy").method.name.l - All methods that call strcpy
  • cpg.call("sprintf").argument(2).filterNot(_.isLiteral).code.l Looking into parameters: second argument to sprintf is NOT a literal

  • cpg.call("sprintf").argument(2).filterNot(_.isLiteral).dump - Quickly see this method above
  • cpg.method.name("parse_public_key_packet").dot |> "/tmp/foo.dot" - Dump dot representations of ASTs for all methods that match parse into file

Exporting Graphs

Joern can create the following graph representations for C/C++ code:

  • Abstract Syntax Trees (AST)
  • Control Flow Graphs (CFG)
  • Control Dependence Graphs (CDG)
  • Data Dependence Graphs (DDG)
  • Program Dependence graphs (PDG)
  • Code Property Graphs (CPG14)

Example of a method name ast

1
2
joern> cpg.method.name("finish_connection").plotDotAst 
plotDotAst    plotDotCdg    plotDotCfg    plotDotCpg14  plotDotDdg    plotDotPdg

In most cases, plotDotCpg14 is the most useful graph. It combines AST and CFG together, more about code property can be read here: https://www.sec.cs.tu-bs.de/pubs/2014-ieeesp.pdf

Joern supports many analysis types. Some of these are run by default. These can be ran with run name

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
joern> run 
res1: OverlaysDynamic = 
__________________________________________________________________________
| name          | description                                             |
|=========================================================================|
| callgraph     | Call graph layer                                        |
| controlflow   | Control flow layer (including dominators and CDG edges) |
| base          | base layer (linked frontend CPG)                        |
| typerelations | Type relations layer (hierarchy and aliases)            |
| dumpast       | Dump abstract syntax trees to out/                      |
| dumpcfg       | Dump control flow graph to out/                         |
| dumpcdg       | Dump control dependence graph to out/                   |
| dumppdg       | Dump program dependence graph to out/                   |
| scan          | Joern Code Scanner                                      |
| dumpddg       | Dump data dependence graphs to out/                     |
| commit        | Apply current custom diffgraph                          |
| ossdataflow   | Layer to support the OSS lightweight data flow tracker  |
| dumpcpg14     | Dump Code Property Graph (2014) to out/                 |

Run Dataflow analysis

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
joern> run ossdataflow 
The graph has been modified. You may want to use the `save` command to persist changes to disk.  All changes will also be saved collectively on exit
res2: Cpg = io.shiftleft.codepropertygraph.generated.Cpg@34e4a136

joern> save 
Saving graphs on disk. This may take a while.
Turning working copy into new persistent CPG
Creating working copy of CPG to be safe
Loading base CPG from: /home/snoopy/joern-workshop/workspace/alloc_party/cpg.bin.tmp
res3: List[workspacehandling.Project] = List(
  Project(
    projectFile = ProjectFile(
      inputPath = "/home/snoopy/joern-workshop/alloc_party",
      name = "alloc_party"
    ),
    path = /home/snoopy/joern-workshop/workspace/alloc_party,
    cpg = Some(value = io.shiftleft.codepropertygraph.generated.Cpg@4a65ce06)
  )
)

joern>  

Define a source method

1
2
joern> def source = cpg.method.name(".*alloc.*").parameter  cpg.method.fullName("main").parameter
defined function source

Define a sink method

1
2
joern> def sink = cpg.call("malloc").where(_.argument(1).isCallTo(Operators.multiplication)).argument 
defined function sink

Code example

1
2
3
4
5
void *alloc_havoc(int y) { //source
  int z = 10;
  void *x = malloc(y * z); //sink
  return x;
}
1
2
3
4
5
6
7
8
9
10
joern>  sink.reachableByFlows(source).p 
res11: List[String] = List(
  """_______________________________________________________________________________________________________
| tracked            | lineNumber| method      | file                                                  |
|======================================================================================================|
| alloc_havoc(int y) | 11        | alloc_havoc | /home/snoopy/joern-workshop/alloc_party/alloc_party.c |
| y * z              | 13        | alloc_havoc | /home/snoopy/joern-workshop/alloc_party/alloc_party.c |
| y * z              | 13        | alloc_havoc | /home/snoopy/joern-workshop/alloc_party/alloc_party.c |
"""
)

Another example

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
joern> def source = cpg.method.fullName("exec").parameter
joern> def sink = cpg.call.name("gets").argument.order(1)
joern> sink.reachableByFlows(source).p 

#include <stdio.h>


int exec(char input) { //source
    int allow = 0;
    gets(input); // user inputs "malicious"
    if (grantAccess(input)) {
        allow = 1;
    }
    if (allow != 0) { // has been overwritten by the overflow of the username.
        privilegedAction();
    }
     return 0;
}

int main () {
    char username[8];
    printf ("Enter your username, please: ");
    scanf("%d", &username);
    exec(username);
   
}

Joern can also be run as a script using the Joern CLI. More about this can be found here: https://docs.joern.io/interpreter. This feature can be useful for mass scanning codebases. The server mode is also useful if you want to build tooling around it https://docs.joern.io/server

Joern Scanning

Joern-Scan has a built in scanner which uses community provided queries from https://queries.joern.io/

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
joern-scan --list-languages
Writing logs to: /tmp/joern-scan-log.txt
Available languages (case insensitive):
- golang
- fuzzy_test_lang
- csharp
- java
- php
- c
- kotlin
- ghidra
- javascript
- python
- llvm
- newc
- javasrc

Note: Most of the above languages are only available in Ocular (Joern Paid version).

To scan a folder run joern-scan /project_to_scan

Other commands:

  • joern-scan --updatedb - Updates built-in query database.
  • joern-scan /file/to/scan –overwrite - Overwrite existing project CPG, run after application changes.
  • joern-scan /file/to/scan –tags xss,defaul - Specify queries to run.

Instead of using the Joern interpreter, another option is to add your custom query in the correct format, build this querydb locally and use it with joern scan.

1
2
3
4
5
$ git clone https://github.com/joernio/query-database/
$ cd query-database
add your query
$ ./install.sh
$ ./joern-scan /file/to/scan

References

  • https://docs.joern.io/
  • https://github.com/joernio/workshops
  • https://queries.joern.io
This post is licensed under CC BY 4.0 by the author.