Teaching programming is harder than it looks. Students struggle with syntax errors, cryptic compiler messages, and the gap between thinking through a problem and writing working code. I wanted to bridge that gap.

StepCode started as a personal project while I was teaching Algorithms and Programming. We used pseudocode in class because it lets students focus on logic without getting lost in language quirks. But pseudocode on paper can not be executed. Students had no way to test their ideas.

So I built an interpreter. One that runs pseudocode directly in the browser.

What StepCode Does

StepCode is a pseudocode language inspired by PSeInt, a tool popular in Latin America for teaching programming. The language supports both Spanish and English keywords. You can write SI or IF, MIENTRAS or WHILE. This makes it accessible to Spanish-speaking students who are just starting out.

Here is a simple program that greets the user:

Proceso Saludo
    Definir nombre Como Cadena;
    Escribir "Como te llamas?";
    Leer nombre;
    Escribir "Hola, ", nombre, "!";
FinProceso

The interpreter handles variables, loops, conditionals, arrays, functions, and procedures. It even supports pass-by-reference parameters for functions. Everything runs in JavaScript, so it works in any browser without installation.

The Architecture

Building a programming language interpreter involves three main pieces: a lexer, a parser, and the interpreter itself.

The lexer breaks source code into tokens. The parser arranges those tokens into a tree structure called an Abstract Syntax Tree (AST). The interpreter walks that tree and executes the code.

I used ANTLR4 to generate the lexer and parser. ANTLR4 is a parser generator that takes a grammar file and produces source code for parsing. You write the grammar rules once, and ANTLR4 handles the rest.

The Grammar File

The grammar file defines two things: what tokens look like and how they combine into valid programs.

Tokens are the basic building blocks. Keywords, operators, numbers, strings. Here is how I defined the bilingual keywords:

IF: 'SI' | 'IF';
WHILE: 'WHILE' | 'MIENTRAS';
PROGRAM: 'PROCESO' | 'ALGORITMO' | 'PROGRAM';

ANTLR4 treats any of these alternatives as the same token. The grammar is also case-insensitive, so if, IF, and If all work.

Parser rules describe how tokens fit together. A program has a main block and optional subprograms:

program: directives* subprogram* main subprogram* EOF;
main: programHeading block ENDPROGRAM;

An if statement looks like this in the grammar:

ifStatement: IF expression THEN compoundStatement (elifStatement | elseStatement?) ENDIF;

The grammar supports all the control structures you would expect. FOR loops with optional step values. WHILE loops. REPEAT/UNTIL loops. CASE statements for multi-way branching.

The Visitor Pattern Interpreter

ANTLR4 generates a base visitor class with empty methods for each grammar rule. I extended this class and filled in the logic for each node type.

The interpreter uses async/await throughout. This matters for I/O operations. When the code calls LEER (read input), the interpreter needs to wait for user input. Running everything asynchronously keeps the browser responsive.

Here is how the if statement visitor works:

visitIfStatement = async (ctx: IfStatementContext) => {
  const expression = await this.visit(ctx.expression())
  if (expression.value) {
    return await this.visit(ctx.compoundStatement())
  } else {
    if (ctx.elifStatement()) {
      return await this.visit(ctx.elifStatement())
    }
    if (ctx.elseStatement()) {
      return await this.visit(ctx.elseStatement())
    }
  }
}

It evaluates the condition first. If true, it visits the then-block. Otherwise, it checks for else-if or else blocks.

Managing State

Variables live in a symbol table implemented as a Map. Each variable stores its type and value. When the interpreter enters a function or procedure, it pushes a new scope onto a call stack. This keeps local variables separate from global ones.

The interpreter enforces a call stack limit of 100 frames to catch runaway recursion. Without this, an infinite loop would hang the browser.

Pass By Reference

StepCode supports passing arguments by reference. This was tricky to implement. When a parameter is marked POR REFERENCIA (by reference), changes inside the function affect the original variable.

The interpreter handles this by storing a reference to the original variable entry in the symbol table rather than copying the value. Array element references are even more complex. They need to track which array and which index to modify.

Built-in Functions

The interpreter includes common functions for string manipulation and math:

  • LONGITUD / LENGTH - string or array length
  • SUBCADENA / SUBSTRING - extract part of a string
  • MAYUSCULAS / UPPER - convert to uppercase
  • MINUSCULAS / LOWER - convert to lowercase
  • TRUNCAR / TRUNC - remove decimal part
  • REDONDEAR / ROUND - round to nearest integer
  • ALEATORIO / RANDOM - random number
  • CONVERTIRANUMERO / TONUM - parse string to number
  • CONVERTIRATEXTO / TOSTR - convert to string

Each function works with both Spanish and English names.

The Event Bus

I/O works through an event bus. This decouples the interpreter from any specific UI. The interpreter emits events for output and listens for input responses.

eventBus.on('output-request', (message: string) => {
  console.log(message);
});

eventBus.on('input-request', (resolve: (s: string) => void) => {
  // Get input from user, then call resolve(input)
});

This design lets the same interpreter run in different environments. A terminal app, a web page, or a test suite can all plug into the event bus.

Array Support

Arrays in StepCode are 1-indexed by default, matching pseudocode conventions. A directive lets you switch to 0-indexed arrays if needed:

$arrays@stepcode
Proceso Test
    Definir a Como Entero;
    Dimension a[5];
    a[0] <- 10;
FinProceso

The DIMENSION statement creates arrays. Multi-dimensional arrays work too. The interpreter creates nested JavaScript arrays under the hood.

Error Handling

The parser catches syntax errors. Missing semicolons, unbalanced parentheses, unknown keywords. ANTLR4 provides line and column numbers for error messages.

Runtime errors need custom handling. The interpreter throws StepCodeError with location information when it hits problems like undefined variables, type mismatches, or stack overflow.

Challenges I Faced

Getting scope right was the hardest part. Variables inside a function should not leak outside. Pass-by-reference needs to modify the original. Nested function calls need their own clean scopes.

ANTLR4 grammar ambiguity caused headaches too. Expression parsing needs careful precedence rules. Does 2 + 3 * 4 equal 14 or 20? The grammar structure determines this, and getting it wrong causes subtle bugs.

The bilingual design created some token conflicts. Some Spanish words overlapped with English in unexpected ways. I had to restructure parts of the grammar to avoid ambiguity.

What Makes StepCode Useful

StepCode fills a specific niche. Students can write pseudocode and actually run it. They get immediate feedback on their logic. They can debug step by step.

The bilingual support matters for accessibility. Learning programming is hard enough without fighting an unfamiliar language at the same time.

The browser-based runtime means zero setup. No installs, no environment configuration, no IT support requests. Open a web page and start coding.

Using StepCode

Install from npm:

npm install stepcode

Then interpret code:

import { interpret, EventBus } from 'stepcode';

const eventBus = new EventBus();

eventBus.on('output-request', (message: string) => {
  console.log(message);
});

await interpret({
  code: `Proceso HolaMundo
      Escribir "Hola mundo";
  FinProceso`,
  eventBus
});

The validate function checks syntax without executing:

import { validate } from 'stepcode';

const errors = validate(`Proceso Test
    Escribir "Missing semicolon"
FinProceso`);

What I Learned

Building a programming language taught me more about parsers and interpreters than any textbook could. I now understand why certain design decisions matter. Why scope rules exist. Why type systems help.

The project also reinforced that tools matter for learning. Giving students a way to run their pseudocode made a real difference in my classes.

Back to Projects