Joern Notes
Joern is a static analyzer that can be used to create code property graphs and query them fairly easy. This is good alternative to CodeQL since analyzing with Joern doesn’t require you to compile/build the project. I often use this for cases where I can’t use CodeQL or Snyk’s internal static analysis engine to analyse a codebase.
Install Joern (Linux)
Pre-requisites
1
apt install source-highlight graphviz unzip
Setup Joern CLI
1
2
3
4
mkdir joern && cd joern # optional
curl -L "https://github.com/joernio/joern/releases/latest/download/joern-install.sh" -o joern-install.sh
chmod u+x joern-install.sh
./joern-install.sh --interactive
Import a Project, create CPG and load to console
1
2
3
4
5
6
7
8
9
10
11
joern> importCode("crow")
Using generator for language: NEWC: CCpgGenerator
Creating project `crow` for code at `crow`
moving cpg.bin.zip to cpg.bin because it is already a database file
Creating working copy of CPG to be safe
Loading base CPG from: /home/snoopy/joern-workshop/workspace/crow/cpg.bin.tmp
Code successfully imported. You can now query it using `cpg`.
For an overview of all imported code, type `workspace`.
Adding default overlays to base CPG
The graph has been modified. You may want to use the `save` command to persist changes to disk. All changes will also be saved collectively on exit
res0: Cpg = io.shiftleft.codepropertygraph.generated.Cpg@29345a91
The value “crow” is a folder which contains the source code we are trying to analyze.
Joern also saves all analyzed projects, this can be viewed by typing workspace
1
2
3
4
5
6
7
joern> workspace
res1: workspacehandling.WorkspaceManager[JoernProject] =
____________________________________________________________________________________________
| name | overlays | inputPath | open |
|===========================================================================================|
| NodeBB1 | | /home/snoopy/joern-workshop/NodeBB | false |
| NodeBB | controlflow,typerel,base,callgraph | /home/snoopy/joern-workshop/NodeBB | false |
Open can be used to load already analyzed projects, ImportCPG can also be used to load already created bin files
1
2
3
4
5
6
7
8
9
10
11
joern> open("/home/snoopy/joern-workshop/NodeBB");
Passing paths to `loadCpg` is deprecated, please use a project name
Creating working copy of CPG to be safe
Loading base CPG from: /home/snoopy/joern-workshop/workspace/NodeBB/cpg.bin.tmp
res2: Option[workspacehandling.Project] = Some(
value = Project(
projectFile = ProjectFile(inputPath = "/home/snoopy/joern-workshop/NodeBB", name = "NodeBB"),
path = /home/snoopy/joern-workshop/workspace/NodeBB,
cpg = Some(value = io.shiftleft.codepropertygraph.generated.Cpg@762e3836)
)
)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
joern> open("NodeBB9")
Creating working copy of CPG to be safe
Loading base CPG from: /home/snoopy/joern-workshop/workspace/NodeBB/cpg.bin.tmp
res1: Option[workspacehandling.Project] = Some(
value = Project(
projectFile = ProjectFile(
inputPath = "/home/snoopy/joern-workshop/NodeBB",
name = "NodeBB"
),
path = /home/snoopy/joern-workshop/workspace/NodeBB,
cpg = Some(value = io.shiftleft.codepropertygraph.generated.Cpg@1c65740a)
)
)
joern>
Basic Search
Searching
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
// search for any methods that has the sanitize in its name. Regex can be used here
joern> cpg.method.name(".*sanitize.*").name.l
res4: List[String] = List("sanitizeSignature", "sanitize")
// dump the code block that matched the search
joern> cpg.method.name(".*find.*").dump
res3: List[String] = List(
"""static Map* find_hash( ino_t ino, dev_t dev, off_t size, time_t ct ); /* <=== */
""",
"""static Map* /* <=== */
find_hash( ino_t ino, dev_t dev, off_t size, time_t ct )
{
unsigned int h, he, i;
Map* m;
h = hash( ino, dev, size, ct );
he = ( h + hash_size - 1 ) & hash_mask;
for ( i = h; ; i = ( i + 1 ) & hash_mask )
{
m = hash_table[i];
if ( m == (Map*) 0 )
break;
if ( m->hash == h && m->ino == ino && m->dev == dev &&
m->size == size && m->ct == ct )
return m;
if ( i == he )
break;
}
return (Map*) 0;
}
Other useful commands
cpg.method.name("parse_public_key_packet").local.name.l
- Find all local variables defined in a methodcpg.method.name("parse_public_key_packet").location.map( x=> (x.lineNumber.get,x.filename)).l
- Find which file and line number they are incpg.method.name("parse_public_key_packet").local.typ.name.l.head
- Find the type of the first local variable defined in a method-
cpg.method.name("parse_public_key_packet").callOut.name.l
- Find all outgoing calls (call-sites) in a method cpg.method.name("parse_public_key_packet").caller.name.l
- Find which method calls a methodcpg.types.name("vlc_.*").localsOfType.name.l
- List all local variables of type vlc_.*cpg.types.name("vlc_log_t").map( x=> (x.name, x.start.member.name.l)).l
- Find member variables of a structcpg.local.filter(_.typ.name("vlc_log_t")).name.l
- Find local variables and filter them by their typecpg.local.filter(_.typ.name("vlc_log_t")).method.dump
- Which method are they used in?
cpg.local.filter(_.typ.name("vlc_log_t")).method.file.name.l
- Get the filenames where these methods are
cpg.method.where(_.parameter.size > 4).signature.l
- Identify functions with more than 4 parameterscpg.method.where(_.controlStructure.size > 4).name.l
- Identify functions with > 4 control structures (cyclomatic complexity)-
cpg.method.where(_.numberOfLines >= 500).name.l
- Identify functions with more than 500 lines of code cpg.method.where(_.ast.isReturn.l.size > 1).name.l
- Identify functions with multiple return statementscpg.method.where(_.ast.isControlStructure.parserTypeName("(For|Do|While).*").size >4).name.l
- Identify functions with more than 4 loops-
cpg.method.where(_.depth(_.isControlStructure) > 3).name.l
- Identify functions with nesting depth larger than 3 cpg.method.name("find_hash").repeat(_.caller)(_.emit).name.l
- Find the calle of a methodcpg.method.external.name.l.distinct.sorted
- All names of external methods used by the programcpg.call("str.*").code.l
- All calls to functions that start with “str*”cpg.call("strcpy").method.name.l
- All methods that call strcpy-
cpg.call("sprintf").argument(2).filterNot(_.isLiteral).code.l
Looking into parameters: second argument to sprintf is NOT a literal cpg.call("sprintf").argument(2).filterNot(_.isLiteral).dump
- Quickly see this method abovecpg.method.name("parse_public_key_packet").dot |> "/tmp/foo.dot"
- Dump dot representations of ASTs for all methods that match parse into file
Exporting Graphs
Joern can create the following graph representations for C/C++ code:
- Abstract Syntax Trees (AST)
- Control Flow Graphs (CFG)
- Control Dependence Graphs (CDG)
- Data Dependence Graphs (DDG)
- Program Dependence graphs (PDG)
- Code Property Graphs (CPG14)
Example of a method name ast
1
2
joern> cpg.method.name("finish_connection").plotDotAst
plotDotAst plotDotCdg plotDotCfg plotDotCpg14 plotDotDdg plotDotPdg
In most cases, plotDotCpg14 is the most useful graph. It combines AST and CFG together, more about code property can be read here: https://www.sec.cs.tu-bs.de/pubs/2014-ieeesp.pdf
Joern supports many analysis types. Some of these are run by default. These can be ran with run name
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
joern> run
res1: OverlaysDynamic =
__________________________________________________________________________
| name | description |
|=========================================================================|
| callgraph | Call graph layer |
| controlflow | Control flow layer (including dominators and CDG edges) |
| base | base layer (linked frontend CPG) |
| typerelations | Type relations layer (hierarchy and aliases) |
| dumpast | Dump abstract syntax trees to out/ |
| dumpcfg | Dump control flow graph to out/ |
| dumpcdg | Dump control dependence graph to out/ |
| dumppdg | Dump program dependence graph to out/ |
| scan | Joern Code Scanner |
| dumpddg | Dump data dependence graphs to out/ |
| commit | Apply current custom diffgraph |
| ossdataflow | Layer to support the OSS lightweight data flow tracker |
| dumpcpg14 | Dump Code Property Graph (2014) to out/ |
Run Dataflow analysis
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
joern> run ossdataflow
The graph has been modified. You may want to use the `save` command to persist changes to disk. All changes will also be saved collectively on exit
res2: Cpg = io.shiftleft.codepropertygraph.generated.Cpg@34e4a136
joern> save
Saving graphs on disk. This may take a while.
Turning working copy into new persistent CPG
Creating working copy of CPG to be safe
Loading base CPG from: /home/snoopy/joern-workshop/workspace/alloc_party/cpg.bin.tmp
res3: List[workspacehandling.Project] = List(
Project(
projectFile = ProjectFile(
inputPath = "/home/snoopy/joern-workshop/alloc_party",
name = "alloc_party"
),
path = /home/snoopy/joern-workshop/workspace/alloc_party,
cpg = Some(value = io.shiftleft.codepropertygraph.generated.Cpg@4a65ce06)
)
)
joern>
Define a source method
1
2
joern> def source = cpg.method.name(".*alloc.*").parameter cpg.method.fullName("main").parameter
defined function source
Define a sink method
1
2
joern> def sink = cpg.call("malloc").where(_.argument(1).isCallTo(Operators.multiplication)).argument
defined function sink
Code example
1
2
3
4
5
void *alloc_havoc(int y) { //source
int z = 10;
void *x = malloc(y * z); //sink
return x;
}
1
2
3
4
5
6
7
8
9
10
joern> sink.reachableByFlows(source).p
res11: List[String] = List(
"""_______________________________________________________________________________________________________
| tracked | lineNumber| method | file |
|======================================================================================================|
| alloc_havoc(int y) | 11 | alloc_havoc | /home/snoopy/joern-workshop/alloc_party/alloc_party.c |
| y * z | 13 | alloc_havoc | /home/snoopy/joern-workshop/alloc_party/alloc_party.c |
| y * z | 13 | alloc_havoc | /home/snoopy/joern-workshop/alloc_party/alloc_party.c |
"""
)
Another example
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
joern> def source = cpg.method.fullName("exec").parameter
joern> def sink = cpg.call.name("gets").argument.order(1)
joern> sink.reachableByFlows(source).p
#include <stdio.h>
int exec(char input) { //source
int allow = 0;
gets(input); // user inputs "malicious"
if (grantAccess(input)) {
allow = 1;
}
if (allow != 0) { // has been overwritten by the overflow of the username.
privilegedAction();
}
return 0;
}
int main () {
char username[8];
printf ("Enter your username, please: ");
scanf("%d", &username);
exec(username);
}
Joern can also be run as a script using the Joern CLI. More about this can be found here: https://docs.joern.io/interpreter. This feature can be useful for mass scanning codebases. The server mode is also useful if you want to build tooling around it https://docs.joern.io/server
Joern Scanning
Joern-Scan has a built in scanner which uses community provided queries from https://queries.joern.io/
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
joern-scan --list-languages
Writing logs to: /tmp/joern-scan-log.txt
Available languages (case insensitive):
- golang
- fuzzy_test_lang
- csharp
- java
- php
- c
- kotlin
- ghidra
- javascript
- python
- llvm
- newc
- javasrc
Note: Most of the above languages are only available in Ocular (Joern Paid version).
To scan a folder run joern-scan /project_to_scan
Other commands:
joern-scan --updatedb
- Updates built-in query database.joern-scan /file/to/scan
–overwrite - Overwrite existing project CPG, run after application changes.joern-scan /file/to/scan
–tags xss,defaul - Specify queries to run.
Instead of using the Joern interpreter, another option is to add your custom query in the correct format, build this querydb locally and use it with joern scan.
1
2
3
4
5
$ git clone https://github.com/joernio/query-database/
$ cd query-database
add your query
$ ./install.sh
$ ./joern-scan /file/to/scan
References
- https://docs.joern.io/
- https://github.com/joernio/workshops
- https://queries.joern.io