Gremlin (query language)


Gremlin is a graph traversal language and virtual machine developed by Apache TinkerPop of the Apache Software Foundation. Gremlin works for both OLTP-based graph databases as well as OLAP-based graph processors. Gremlin's automata and functional language foundation enable Gremlin to naturally support imperative and declarative querying, host language agnosticism, user-defined domain specific languages, an extensible compiler/optimizer, single- and multi-machine execution models, hybrid depth- and breadth-first evaluation, as well as Turing Completeness.
As an explanatory analogy, Apache TinkerPop and Gremlin are to graph databases what the JDBC and SQL are to relational databases. Likewise, the Gremlin traversal machine is to graph computing as what the Java virtual machine is to general purpose computing.

History

Gremlin is an Apache2-licensed graph traversal language that can be used by graph system vendors. There are typically two types of graph system vendors: OLTP graph databases and OLAP graph processors. The table below outlines those graph vendors that support Gremlin.
VendorGraph System
Neo4jgraph database
OrientDBgraph database
DataStax Enterprise graph database
Hadoop graph processor
Hadoop graph processor
InfiniteGraphgraph database
JanusGraphgraph database
Cosmos DBgraph database
Amazon Neptunegraph database

Traversal examples

The following examples of Gremlin queries and responses in a Gremlin-Groovy environment are relative to a graph representation of the dataset. The dataset includes users who rate movies. Users each have one occupation, and each movie has one or more categories associated with it. The MovieLens graph schema is detailed below.

user--rated-->movie
user--occupation-->occupation
movie--category-->category

Simple traversals


gremlin> g.V.label.groupCount

>occupation:21, movie:3883, category:18, user:6040



gremlin> g.V.hasLabel.values.min

>1919



gremlin> g.V.has.inE.values.mean

>4.121848739495798

Projection traversals


gremlin> g.V.hasLabel.as.
select.
by.
by.count)

>a:Animation, b:105

>a:Children's, b:251

>a:Comedy, b:1200

>a:Adventure, b:283

>a:Fantasy, b:68

>a:Romance, b:471

>a:Drama, b:1603

>a:Action, b:503

>a:Crime, b:211

>a:Thriller, b:492

>a:Horror, b:343

>a:Sci-Fi, b:276

>a:Documentary, b:127

>a:War, b:143

>a:Musical, b:114

>a:Mystery, b:106

>a:Film-Noir, b:44

>a:Western, b:68



gremlin> g.V.hasLabel.as.
where.count.is).
select.
by.
by.values.mean).
order.by.
limit

>a:Sanjuro, b:4.608695652173913

>a:Seven Samurai (The Magnificent Seven), b:4.560509554140127

>a:Shawshank Redemption, The, b:4.554557700942973

>a:Godfather, The, b:4.524966261808367

>a:Close Shave, A, b:4.52054794520548

>a:Usual Suspects, The, b:4.517106001121705

>a:Schindler's List, b:4.510416666666667

>a:Wrong Trousers, The, b:4.507936507936508

>a:Sunset Blvd. (a.k.a. Sunset Boulevard), b:4.491489361702127

>a:Raiders of the Lost Ark, b:4.47772

Declarative pattern matching traversals

Gremlin supports declarative graph pattern matching similar to SPARQL. For instance, the following query below uses Gremlin's match-step.

gremlin> g.V.
match.hasLabel,
__.as.out.has,
__.as.has,
__.as.inE.as,
__.as.has,
__.as.outV.as,
__.as.out.has,
__.as.
select.groupCount.by.
order.by.
limit

>Raiders of the Lost Ark=26

>Star Wars Episode V - The Empire Strikes Back=26

>Terminator, The=23

>Star Wars Episode VI - Return of the Jedi=22

>Princess Bride, The=19

>Aliens=18

>Boat, The (Das Boot)=11

>Indiana Jones and the Last Crusade=11

>Star Trek The Wrath of Khan=10

>Abyss, The=9

OLAP traversal


gremlin> g = graph.traversal

>graphtraversalsourcehadoopgraphgryoinputformat->gryooutputformat, sparkgraphcomputer

gremlin> g.V.repeat.has.inV.
groupCount.by.
inE.has.outV).
times.cap

>Star Wars Episode IV - A New Hope