About Xonai

Apache Spark powers data infrastructure worldwide, but at petabyte scale, performance bottlenecks translate directly into massive infrastructure costs. At Xonai, we're solving that with a novel engine purpose-built to dramatically accelerate Spark jobs at its core and without requiring data teams to change how they work.

We've raised $4.5M in seed funding to build the best-in-class data infrastructure optimisation engine for the AI era.

About the Role

Optimising data infrastructure at scale lies in providing breakthrough data processing acceleration to the bottomline that drives up costs: the software deployed at scale by data engineering teams. As a Senior Software Engineer for this role, you will collaborate with the founding team in the implementation of a next-generation accelerator for Apache Spark, the most widely used Big Data processing engine at petabyte-scale. Working at the intersection of compilers and Big Data analytics, you’ll drive state-of-the-art implementation of algorithms and techniques that span across the entire software stack, from SQL pushdown to enhancements in low-level C++ data processing APIs and beyond. Your contributions to our core IP will directly impact data processing infrastructure transforming petabytes of data every day where Xonai is being deployed.

Responsibilities

Own SQL-acceleration projects end-to-end from planning to implementation and measurable outcomes
Implement algorithms (targeting a custom DSL) for complex SQL functions within a Scala codebase
Drive performance optimisations for Big Data processing algorithms primarily in Java but also C++
Research and develop new greenfield development lying at the intersection of Big Data analytics and compilers

You may be a good fit if you:

Have 5+ years of software engineering experience working in large performance-driven codebases
Have strong experience with statically-typed compiled languages (in particular C++, Java and Scala)
Can navigate through large codebases that plumbs low-level APIs into high-level operations
Have experience deploying or benchmarking data processing software to clusters in the cloud
Leave your comfort zone to tackle challenges across a multi-language software stack
Solve challenging problems independently and know when to pull others in

Strong candidates may have:

Entrepreneurial spirit and previous experience in early stage startups
Experience contributing to popular open-source projects in the domain of data processing
Experience or familiarisation with the implementation of large-scale query engines
Experience in interfacing with OS or systems-level APIs with nuanced compatibility characteristics

Representative projects:

Implement (in a custom DSL) a major SQL operator (e.g. Window aggregate) or smaller SQL expressions
Migrate a complex code-generated algorithm in a custom DSL to a C++ equivalent
Invoke Java code via C++ JNI to provide execution of a SQL function for compatibility reasons
Implement a variant of repartitioning algorithm optimised for edge-case application configurations
Refactor existing code generation algorithms to accommodate new improvements of a custom DSL
Improve GitHub actions workflows to accommodate new build options or new comparative benchmarks

Our Commitment

We are highly committed to create new transformative technologies that deliver unique benefits to our customers. We understand that developing a best-in-class product requires a diverse team of intelligent, passionate and curious people bringing new perspectives. We take great pride in being an equal opportunity employer and we encourage everyone to apply.

Senior Software Engineer, Performance