Daniel's photo

Hello! I am Daniel. I'm a dual-degree Ph.D. Student in Computer Science, and Software Engineering at Instituto Superior Técnico (IST) and also at Carnegie Mellon University (CMU). I'm currently at IST in Portugal! My Ph.D. is supported by a CMU Portugal fellowship, and my expected graduation date is 2025. I'm co-advised by Claire Le Goues, Ruben Martins and Vasco Manquinho. My research aims to help developers by automating tedious (but necessary!) refactoring tasks. Specifically, I work on automatic library migration, fixing breaking-changes, and code generation/maintenance tasks in general. Check my papers and projects below to learn more!

Previously, I worked on Program Synthesis as a Research Assistant at INESC-ID. I have a master's degree in Software Engineering from the School of Computer Science at Carnegie Mellon University.

Short Résumé

Papers

A Lightweight Polyglot Code Transformation Language (PLDI'24).

To address the challenges of maintaining large, multi-language codebases, we've developed a novel domain-specific language (DSL) to bridge the gap between language-specific tools and generic, less expressive options. This DSL, implemented in our open-source tool, PolyglotPiranha allows users to write advanced code transformations and tooling in a language-agnostic way.


BatFix: Repairing language model-based transpilation (ACM TOSEM'24).

Large Language Models (LLMs) have proven to be useful at generating and translating code (e.g., mapping APIs across languages). However, LLMs can sometimes produce buggy code. BatFix combines LLMs' code generation strengths with formal methods, ensuring the output is both creative and reliable, minimizing errors.


MELT: Mining Effective Lightweight Transformations from Pull Requests (ASE'23).

MELT introduces a novel approach to API refactoring, generating lightweight, interpretable migration rules directly from pull requests submitted to library repositories. The key innovation of MELT is that it leverages Large Language Models (LLMs) to understand transitions between API versions, drawing insights from both discussion descriptions and code changes. This enables MELT to effectively bridge gaps where migration examples might be sparse (as in most cases). In contrast to SOAR's comprehensive synthesis approach, MELT provides a lightweight, scalable method, emphasizing real-world applicability and ease of integration.


SOAR: A Synthesis Approach for Data Science API Refactoring (ICSE'21).

With the growth of the open-source data science community, both the number of data science libraries and the number of versions for the same library are increasing rapidly. To match the evolving APIs from those libraries, open-source organizations often have to exert manual effort to refactor the APIs used in the code base. SOAR aims to automate these refactoring tasks.


UnchartIt: An Interactive Framework for Program Recovery from Charts (ASE'20).

UnchartIt is the first program synthesizer to recover data transformations from chart images. Given an input table and a chart, UnchartIt automatically recovers the data transformations in four steps: data extraction, candidate generation, candidate ranking, and candidate disambiguation.

Projects

Polyglot Piranha at Uber - A Flexible Multilingual Framework for Code Refactoring.

Piranha is a flexible multilingual framework designed for chaining interdependent structural search/replace rules using any matching language. It allows users to find and replace code in any declarative matching language, such as regex or tree-sitter queries. With Piranha, users can specify additional matchers in a filter language to achieve finely-grained rewrites. One of its powerful features is the ability to chain rewrite rules using a graph language, enabling cascading program transformations. Piranha will then automatically refactor the code based on the chaining strategy. Furthermore, it provides the capability to generate rewrite rules from templates and annotations.

During my internship at Uber (12 weeks), I made significant technical contributions to Piranha (42 PRs merged), including:

  1. With my mentor, we advocated for a matching-language agnostic approach for Piranha (initially only tree-sitter was supported). We designed and implemented a comby-style language for match-replace, named "Piranha Concrete Syntax".
  2. Improved piranha's filter language for the composition of matchers, allowing for enhanced precision in code matching.
  3. Introduced a mechanism to infer tree-sitter match-replace rules from examples, simplifying the rule creation process.
  4. Created a playground interface for Piranha, enabling users to experiment and tinker with Piranha rules.

UnchartIt Distinguisher - Program Distinguisher for Data Science.

Programming-By-Example (PBE) is the task of automatically generating a program from a set of input-output examples. One major concern in PBE is that specifying user-intent through examples usually leads to ambiguity. Thus, there might be multiple non-equivalent programs that satisfy examples the user provides. Moreover, these programs can behave very differently when provided with a different set of inputs. UnchartIt Distinguisher's allows for the disambiguation of such programs through two different user interaction models.