Apache.org

Website

Project list, by name, (as of 6/21/2021):

HTTP Server
APISIX: A dynamic cloud-native API Gateway.
Accumulo: a sorted, distributed key/value store that provides robust, scalable data storage and retrieval.
ActiveMQ: the most popular open source, multi-protocol, Java-based message broker.
Airavata: a software framework that enables you to compose, manage, execute, and monitor large scale applications and workflows on distributed computing resources such as local clusters, supercomputers, computational grids, and computing clouds.
Airflow: a platform created by the community to programmatically author, schedule and monitor workflows.
Allura: open-source project hosting platform.
Ambari: aimed at making Hadoop management simpler by developing software for provisioning, managing, and monitoring Apache Hadoop clusters. Ambari provides an intuitive, easy-to-use Hadoop management web UI backed by its RESTful APIs.
Ant: a Java library and command-line tool whose mission is to drive processes described in build files as targets and extension points dependent upon each other.
Any23: a library, a web service and a command line tool that extracts structured data in RDF format from a variety of Web documents.
Archiva: build artifact repository manager.
Aries: a set of pluggable Java components enabling an enterprise OSGi application programming model.
Arrow: A cross-language development platform for in-memory analytics.
AsterixDB: a scalable, open source Big Data Management System (BDMS).
Atlas: a scalable and extensible set of core foundational governance services – enabling enterprises to effectively and efficiently meet their compliance requirements within Hadoop and allows integration with the whole enterprise data ecosystem.
Avro: a data serialization system.
Axis: Apache Axis, and the the second generation of it, the Apache Axis2, are two Web Service containers that helps users to create, deploy, and run Web Services.
BVal: delivers an implementation of the Java Bean Validation Specification which is TCK compliant, works on Java SE 8 or later, and uses the Apache Software License v2.0.
Bahir: provides extensions to multiple distributed analytic platforms, extending their reach with a diversity of streaming connectors and SQL data sources. Currently provides extensions for Apache Spark and Apache Flink.
Beam: An advanced unified programming model; implement batch and streaming data processing jobs that run on any execution engine.
Bigtop: an Apache Foundation project for Infrastructure Engineers and Data Scientists looking for comprehensive packaging, testing, and configuration of the leading open source big data components. Bigtop supports a wide range of components/projects, including, but not limited to, Hadoop, HBase and Spark.
Bloodhound: successor to Trac. Manage software products. Keep track of features, tasks and bugs.
BookKeeper: A scalable, fault-tolerant, and low-latency storage service optimized for real-time workloads.
Brooklyn: Your applications, any clouds, any containers, anywhere.
Buildr: a build system for Java-based applications, including support for Scala, Groovy and a growing number of JVM languages and tools.
CXF: an open source services framework that helps you build and develop services using frontend programming APIs, like JAX-WS and JAX-RS, which can speak a variety of protocols such as SOAP, XML/HTTP, RESTful HTTP, or CORBA and work over a variety of transports such as HTTP, JMS or JBI.
Calcite: The foundation for your next high-performance database. Industry-standard SQL parser; Query optimization; connect to third-party sources, browse metadata, and optimize by pushing the computation to the data.
- Avatica: a framework for building database drivers; defined by a wire API between a client and a server. The Avatica server is an HTTP server, the Avatica client is a JDBC driver, and the wire API is defined by JSON or Protobuf Buffers. The flexibility of the wire API and HTTP transport allows other Avatica clients to be built in any language, implementing any client specification.
Camel: an Open Source integration framework that empowers you to quickly and easily integrate various systems consuming or producing data.
CarbonData: an indexed columnar data file format for fast analytics on big data platform, e.g. Apache Hadoop, Apache Spark, etc.
Cassandra
Cayenne: an open source Java object-to-relational mapping framework.
Celix: An implementation of the OSGi specification adapted to C and C++.
Chemistry: provides open source implementations of the Content Management Interoperability Services (CMIS) specification (an OASIS standard enabling information sharing between different Content Management Systems).
Clerezza: a set of Java libraries for management of semantically linked data. Contents are stored as triples based on W3C RDF specification. Apache Clerezza defines a technology-agnostic layer to access and modify triple stores. It provides a java implementation of the graph data model specified by W3C RDF and functionalities to operate on that data model. Apache Clerezza offers a service interface to access multiple named graphs and it can use various providers to manage RDF graphs in a technology specific manner, e.g., using Jena or Sesame. It also provides for adaptors that allow an application to use various APIs (including the Jena api) to process RDF graphs. Furthermore, Apache Clerezza offers a serialization and a parsing service to convert a graph into a certain representation (format) and vice versa.
CloudStack: open source software designed to deploy and manage large networks of virtual machines, as a highly available, highly scalable Infrastructure as a Service (IaaS) cloud computing platform. A turnkey solution that includes the entire "stack" of features most organizations want with an IaaS cloud: compute orchestration, Network-as-a-Service, user and account management, a full and open native API, resource accounting, and a first-class User Interface (UI).
Cocoon: a Spring-based framework (since version 2.2 of Cocoon) built around the concepts of separation of concerns and component-based development. Cocoon implements these concepts around the notion of component pipelines, each component on the pipeline specializing on a particular operation.
Commons: an Apache project focused on all aspects of reusable Java components and is composed of three parts:
- Proper: reusable Java components.
  - BCEL: Byte Code Engineering Library - analyze, create, and manipulate Java class files
  - BeanUtils: Easy-to-use wrappers around the Java reflection and introspection APIs.
  - BSF: Bean Scripting Framework - interface to scripting languages, including JSR-223
  - Chain: Chain of Responsibility pattern implemention.
  - CLI: Command Line arguments parser.
  - Codec: General encoding/decoding algorithms (for example phonetic, base64, URL).
  - Collections: Extends or augments the Java Collections Framework.
  - Compress: Defines an API for working with tar, zip and bzip2 files.
  - Configuration: Reading of configuration/preferences files in various formats.
  - Crypto: A cryptographic library optimized with AES-NI wrapping Openssl or JCE algorithm implementations.
  - CSV: Component for reading and writing comma separated value files.
  - Daemon: Alternative invocation mechanism for unix-daemon-like java code.
  - DBCP: Database connection pooling services.
  - DbUtils: JDBC helper library.
  - Digester: XML-to-Java-object mapping utility.
  - Email: Library for sending e-mail from Java.
  - Exec: API for dealing with external process execution and environment management in Java.
  - FileUpload: File upload capability for your servlets and web applications.
  - Functor: A functor is a function that can be manipulated as an object, or an object representing a single, generic function.
  - Geometry: Space and coordinates.
  - Imaging (previously called Sanselan): A pure-Java image library.
  - IO: Collection of I/O utilities.
  - JCI: Java Compiler Interface
  - JCS: Java Caching System
  - Jelly: XML based scripting and processing engine.
  - Jexl: Expression language which extends the Expression Language of the JSTL.
  - JXPath: Utilities for manipulating Java Beans using the XPath syntax.
  - Lang: Provides extra functionality for classes in java.lang.
  - Logging: Wrapper around a variety of logging API implementations.
  - Math: Lightweight, self-contained mathematics and statistics components.
  - Net: Collection of network utilities and protocol implementations.
  - Numbers: Number types (complex, quaternion, fraction) and utilities (arrays, combinatorics).
  - OGNL: An Object-Graph Navigation Language
  - Pool: Generic object pooling component.
  - Proxy: Library for creating dynamic proxies.
  - RDF: Common implementation of RDF 1.1 that could be implemented by systems on the JVM.
  - RNG: Implementations of random numbers generators.
  - SCXML: An implementation of the State Chart XML specification aimed at creating and maintaining a Java SCXML engine. It is capable of executing a state machine defined using a SCXML document, and abstracts out the environment interfaces.
  - Statistics: Statistics.
  - Text: Apache Commons Text is a library focused on algorithms working on strings.
  - Validator: Framework to define validators and validation rules in an xml file.
  - VFS: Virtual File System component for treating files, FTP, SMB, ZIP and such like as a single logical file system.
  - Weaver: Provides an easy way to enhance (weave) compiled bytecode.
- Sandbox: components under development.
- Dormant: components currently inactive.
Cordova
CouchDB
Creadur: This project started as just Rat, a release auditing tool good with licenses, coded in Java with plugins for Ant and Maven; it became clear that creating one uber-tool in one language is less useful than collecting a suite:
- Apache Rat audits license headers — the boilerplate text needed in most source files. Coded in Java, it runs from the command line with plugins for Maven and Ant.
- Apache Tentacles helps to audit in bulk components uploaded to a staging repository. Coded in Java, it runs from the command line.
- Apache Whisker assists assembled applications maintain correct legal documentation. Coded in Java, it runs from the command line or as a plugin for Maven.
Curator: a Java/JVM client library for Apache ZooKeeper, a distributed coordination service. It includes a highlevel API framework and utilities to make using Apache ZooKeeper much easier and more reliable. It also includes recipes for common use cases and extensions such as service discovery and a Java 8 asynchronous DSL.
cTAKES: a natural language processing system for extraction of information from electronic medical record clinical free-text.
DB: charged with the creation and maintenance of commercial-quality, open-source, database solutions based on software licensed to the Foundation, for distribution at no charge to the public.
- Derby: an open source relational database implemented entirely in Java
- Java Data Objects (JDO): a standard way to access persistent data in databases, using plain old Java objects (POJO) to represent persistent data. The approach separates data manipulation (done by accessing Java data members in the Java domain objects) from database manipulation (done by calling the JDO interface methods).
- Torque: an object-relational mapper for java. In other words, Torque lets you access and manipulate data in a relational database using java objects. Unlike most other object-relational mappers, Torque does not use reflection to access user-provided classes, but it generates the necessary classes (including the Data Objects) from an XML schema describing the database layout. The XML file can either be written by hand or a starting point can be generated from an existing database. The XML schema can also be used to generate and execute a SQL script which creates all the tables in the database.
Daffodil: Open-source implementation of the Data Format Description Language to convert between fixed format data and XML, JSON, and other data structures.
DataFu: a collection of libraries for working with large-scale data in Hadoop:
- DataFu Spark: a collection of utils and user-defined functions for Apache Spark.
- DataFu Pig: a collection of user-defined functions and macros for Apache Pig.
- DataFu Hourglass: an incremental processing framework for Apache Hadoop in MapReduce.
DataSketches: A software library of stochastic streaming algorithms: In the analysis of big data there are often problem queries that don’t scale because they require huge compute resources and time to generate exact results. Examples include count distinct, quantiles, most-frequent items, joins, matrix computations, and graph analysis.If approximate results are acceptable, there is a class of specialized algorithms, called streaming algorithms, or sketches that can produce results orders-of magnitude faster and with mathematically proven error bounds. For interactive queries there may not be other viable alternatives, and in the case of real-time analysis, sketches are the only known solution. For any system that needs to extract useful information from big data these sketches are a required toolkit that should be tightly integrated into their analysis capabilities. This project is dedicated to providing a broad selection of sketch algorithms of production quality.
DeltaSpike: consists of a number of portable CDI extensions that provide useful features for Java application developers.
Directory: provides directory solutions entirely written in Java. These include a directory server, which has been certified as LDAP v3 compliant by the Open Group (ApacheDS), and Eclipse-based directory tools (Apache Directory Studio).
DolphinScheduler: A distributed and easy-to-extend visual workflow scheduler system.
Drill: Schema-free SQL Query Engine for Hadoop, NoSQL and Cloud Storage.
Druid: a high performance real-time analytics database.
Dubbo: a high-performance, java based open source RPC framework.
ECharts: An Open Source JavaScript Visualization Library.
Empire-db: a relational database abstraction layer and data persistence component that allows developers to take a much more SQL-centric approach in application development than traditional Object-relational mapping frameworks (ORM).
Felix: a community effort to implement the OSGi Framework and Service platform and other interesting OSGi-related technologies under the Apache license.
Fineract: is open source software for financial services. Fineract provides a reliable, robust, and affordable solution for entrepreneurs, financial institutions, and service providers to offer financial services to the world’s 2 billion underbanked and unbanked. Fineract is aimed at innovative mobile and cloud-based solutions, and enables digital transaction accounts for all.
Flex: ActionScript, MXML, and the final resting place for the platform originally created by Adobe. Work appears to have stopped in 2017, and all major browsers announced end-of-life for support for Flash by the end of 2020.
Flink: a framework and distributed processing engine for stateful computations over unbounded and bounded data streams. Flink has been designed to run in all common cluster environments, perform computations at in-memory speed and at any scale.
Flume: a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data.
Fluo: a distributed processing system that lets users make incremental updates to large data sets.
FreeMarker: a template engine: a Java library to generate text output (HTML web pages, e-mails, configuration files, source code, etc.) based on templates and changing data. Templates are written in the FreeMarker Template Language (FTL), which is a simple, specialized language (not a full-blown programming language like PHP). Usually, a general-purpose programming language (like Java) is used to prepare the data (issue database queries, do business calculations). Then, Apache FreeMarker displays that prepared data using templates. In the template you are focusing on how to present the data, and outside the template you are focusing on what data to present.
Geode: a data management platform that provides real-time, consistent access to data-intensive applications throughout widely distributed cloud architectures. Geode pools memory, CPU, network resources, and optionally local disk across multiple processes to manage application objects and behavior. It uses dynamic replication and data partitioning techniques to implement high availability, improved performance, scalability, and fault tolerance. In addition to being a distributed data container, Apache Geode is an in-memory data management system that provides reliable asynchronous event notifications and guaranteed message delivery. Apache Geode is a mature, robust technology originally developed by GemStone Systems. Commercially available as GemFire, it was first deployed in the financial sector as the transactional, low-latency data engine used in Wall Street trading platforms.
Geronimo: an open source server runtime that integrates the best open source projects to create Java/OSGi server runtimes that meet the needs of enterprise developers and system administrators. Our most popular distribution has previously been a fully certified Java EE 6 application server runtime. Now we are refocusing on providing JavaEE/JakartaEE libraries and Microprofile implementations.
Giraph: an iterative graph processing system built for high scalability. For example, it is currently used at Facebook to analyze the social graph formed by users and their connections. Giraph originated as the open-source counterpart to Pregel, the graph processing architecture developed at Google and described in a 2010 paper. Both systems are inspired by the Bulk Synchronous Parallel model of distributed computation introduced by Leslie Valiant. Giraph adds several features beyond the basic Pregel model, including master computation, sharded aggregators, edge-oriented input, out-of-core computation, and more.
Gobblin: A distributed data integration framework that simplifies common aspects of big data integration such as data ingestion, replication, organization and lifecycle management for both streaming and batch data ecosystems.
Gora: provides an in-memory data model and persistence for big data. Gora supports persisting to column stores, key value stores, document stores, distributed in-memory key/value stores, in-memory data grids, in-memory caches, distributed multi-model stores, and hybrid in-memory architectures.
Griffin: Big Data Quality Solution For Batch and Streaming.
Groovy: a dynamically-typed programming language for the JVM.
Guacamole: a clientless remote desktop gateway. It supports standard protocols like VNC, RDP, and SSH. We call it clientless because no plugins or client software are required. Thanks to HTML5, once Guacamole is installed on a server, all you need to access your desktops is a web browser.
Gump: a continuous integration tool. It is written in Python and fully supports Apache Ant, Apache Maven (1.x to 3.x) and other build tools. Gump is unique in that it builds and compiles software against the latest development versions of those projects. This allows Gump to detect potentially incompatible changes to that software just a few hours after those changes are checked into the version control system. Notifications are sent to the project team as soon as such a change is detected, referencing more detailed reports available online.
HAWQ: Apache Hadoop Native SQL. Advanced Analytics MPP Database for Enterprises.
HBase: the Hadoop database, a distributed, scalable, big data store.
Hadoop: a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.
Helix: a generic cluster management framework used for the automatic management of partitioned, replicated and distributed resources hosted on a cluster of nodes. Helix automates reassignment of resources in the face of node failure and recovery, cluster expansion, and reconfiguration.
Hive: data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storage using SQL. Structure can be projected onto data already in storage. A command line tool and JDBC driver are provided to connect users to Hive.
HttpComponents: responsible for creating and maintaining a toolset of low level Java components focused on HTTP and associated protocols.
Hudi: ingests & manages storage of large analytical datasets over DFS (hdfs or cloud stores).
Iceberg: an open table format for huge analytic datasets. Iceberg adds tables to Trino and Spark that use a high-performance format that works just like a SQL table.
Ignite: Distributed Database For High-Performance Computing With In-Memory Speed.
Impala: the open source, native analytic database for Apache Hadoop.
IoTDB: an IoT native database with high performance for data management and analysis, deployable on the edge and the cloud. Due to its light-weight architecture, high performance and rich feature set together with its deep integration with Apache Hadoop, Spark and Flink, Apache IoTDB can meet the requirements of massive data storage, high-speed data ingestion and complex data analysis in the IoT industrial fields.
Isis: NakedObjects implementation on the JVM.
JMeter: a 100% pure Java application designed to load test functional behavior and measure performance. It was originally designed for testing Web Applications but has since expanded to other test functions.
JSPWiki: a leading open source WikiWiki engine, feature-rich and built around standard JEE components (Java, servlets, JSP).
Jackrabbit: a fully conforming implementation of the Content Repository for Java Technology API (JCR, specified in JSR 170 and JSR 283). A content repository is a hierarchical content store with support for structured and unstructured content, full text search, versioning, transactions, observation, and more.
James: Java Apache Mail Enterprise Server! It has a modular architecture based on a rich set of modern and efficient components which provides at the end complete, stable, secure and extendable Mail Servers running on the JVM. Create your own personal solution of emails treatment by assembling the components you need thanks to the Inversion of Control mail platform offered and go further customizing filtering and routing rules using James Mailet Container.
Jena: A free and open source Java framework for building Semantic Web and Linked Data applications.
Johnzon: a project providing an implementation of JsonProcessing (aka JSR-353) and a set of useful extension for this specification like an Object mapper, some JAX-RS providers and a WebSocket module provides a basic integration with Java WebSocket API (JSR-356).
Joshua: a statistical machine translation decoder for phrase-based, hierarchical, and syntax-based machine translation, written in Java.
Juneau: a single cohesive Java ecosystem consisting of the following parts:
- juneau-core
  - juneau-marshall A universal toolkit for marshalling POJOs to a variety of content types using a common framework with no external library dependencies.
  - juneau-marshall-rdf Extended marshalling support for RDF languages.
  - juneau-dto A variety of predefined DTOs for serializing and parsing languages such as HTML5, Swagger and ATOM.
  - juneau-config A sophisticated configuration file API.
- juneau-rest
  - juneau-rest-server A universal REST server API for creating Swagger-based self-documenting REST interfaces using POJOs, simply deployed as one or more top-level servlets in any Servlet 3.1.0+ container. Includes Spring Boot and JAX-RS integration support.
  - juneau-rest-client A universal REST client API for interacting with Juneau or 3rd-party REST interfaces using POJOs and proxy interfaces.
jUDDI: an open source Java implementation of OASIS the Universal Description, Discovery, and Integration (UDDI) specification for (Web) Services. The jUDDI project includes Scout, an implementation of the JSR 93 - JavaTM API for XML Registries 1.0 (JAXR).
jclouds: an open source multi-cloud toolkit for the Java platform that gives you the freedom to create applications that are portable across clouds while giving you full control to use cloud-specific features.
Kafka: an open-source distributed event streaming platform used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications.
Karaf: provides modulith runtime for the enterprise, running on premise or on cloud. Focus on your business code and applications, Apache Karaf deals with the rest.
Kibble: a suite of tools for collecting, aggregating and visualizing activity in software projects.
Knox: an Application Gateway for interacting with the REST APIs and UIs of Apache Hadoop deployments. The Knox Gateway provides a single access point for all REST and HTTP interactions with Apache Hadoop clusters.
Kudu: an open source distributed data storage engine that makes fast analytics on fast and changing data easy.
Kylin: an open source, distributed Analytical Data Warehouse for Big Data; it was designed to provide OLAP (Online Analytical Processing) capability in the big data era. By renovating the multi-dimensional cube and precalculation technology on Hadoop and Spark, Kylin is able to achieve near constant query speed regardless of the ever-growing data volume. Reducing query latency from minutes to sub-second, Kylin brings online analytics back to big data.
Libcloud: Python library for interacting with many of the popular cloud service providers using a unified API.
Logging Services: creates and maintains open-source software related to the logging of application behavior and released at no charge to the public.
- Log4J
- log4cxx
- Log4Net
- Log4JKotlin
- Log4JScala
- chainsaw
- Log4JAudit
Lucene: releases a core search library, named Lucene™ core, as well as PyLucene, a python binding for Lucene. Lucene Core is a Java library providing powerful indexing and search features, as well as spellchecking, hit highlighting and advanced analysis/tokenization capabilities. The PyLucene sub project provides Python bindings for Lucene Core.
Lucene.Net: a high performance search engine library for .NET.
MADlib: Big Data Machine Learning in SQL.
MINA: a network application framework which helps users develop high performance and high scalability network applications easily. It provides an abstract event-driven asynchronous API over various transports such as TCP/IP and UDP/IP via Java NIO.
Mahout: a distributed linear algebra framework and mathematically expressive Scala DSL designed to let mathematicians, statisticians, and data scientists quickly implement their own algorithms. Apache Spark is the recommended out-of-the-box distributed back-end, or can be extended to other distributed backends.
ManifoldCF: an effort to provide an open source framework for connecting source content repositories like Microsoft Sharepoint and EMC Documentum, to target repositories or indexes, such as Apache Solr, Open Search Server, or ElasticSearch. Apache ManifoldCF also defines a security model for target repositories that permits them to enforce source-repository security policies.
Maven: a software project management and comprehension tool. Based on the concept of a project object model (POM), Maven can manage a project's build, reporting and documentation from a central piece of information.
Mesos: abstracts CPU, memory, storage, and other compute resources away from machines (physical or virtual), enabling fault-tolerant and elastic distributed systems to easily be built and run effectively.
Mnemonic: an advanced hybrid memory storages oriented library, it proposed a non-volatile/durable Java object model and durable computing service that bring several advantages to significantly improve the performance of massive real-time data processing/analytics. Developers are able to use this library to design their cache-less and SerDe-less high performance applications.
MyFaces: a project of the Apache Software Foundation, and hosts several sub-projects relating to the JavaServer™ Faces (JSF) technology.
- Projects
  - MyFaces Core | Implementation of the JSF specification
  - Apache Tobago | A component library
- Inactive Projects (Maintenance Mode)
  - MyFaces Commons | Utilities like components, converters, validators
  - MyFaces Tomahawk | A component library
  - MyFaces Trinidad | A component library (former Oracle ADF-Faces)
  - MyFaces Orchestra | Utility library based on Spring
  - MyFaces Extensions Validator | Validation framework based on annotations
  - MyFaces Extensions CDI | Utility library based on CDI
  - MyFaces Extensions Scripting | Adds scripting and rapid prototyping (hot deployment) to JSF
  - MyFaces Portlet Bridge | Bridge between Portlets and JSF
Mynewt: An OS to build, deploy and securely manage billions of devices.
NetBeans: IDE.
NiFi: An easy to use, powerful, and reliable system to process and distribute data. Supports powerful and scalable directed graphs of data routing, transformation, and system mediation logic. Some of the high-level capabilities and objectives of Apache NiFi include Web-based user interface (Seamless experience between design, control, feedback, and monitoring), Highly configurable (Loss tolerant vs guaranteed delivery, Low latency vs high throughput, Dynamic prioritization, Flow can be modified at runtime, Back pressure), Data Provenance (Track dataflow from beginning to end), Designed for extension (Build your own processors and more, Enables rapid development and effective testing), Secure (SSL, SSH, HTTPS, encrypted content, etc..., Multi-tenant authorization and internal authorization/policy management)
Nutch: a well matured, production ready Web crawler. Nutch 1.x enables fine grained configuration, relying on Apache Hadoop data structures, which are great for batch processing. Being pluggable and modular of course has it's benefits, Nutch provides extensible interfaces such as Parse, Index and ScoringFilter's for custom implementations e.g. Apache Tika for parsing. Additonally, pluggable indexing exists for Apache Solr, Elastic Search, SolrCloud, etc.
OFBiz: a suite of business applications flexible enough to be used across any industry. A common architecture allows developers to easily extend or enhance it to create custom features.
Object-Oriented Data Technology (OODT): Traditional processing pipelines are commonly made up of custom UNIX shell scripts and fragile custom written glue code. Apache OODT uses structured XML-based capturing of the processing pipeline that can be understood and modified by non-programmers to create, edit, manage and provision workflow and task execution. OODT also allows for remote execution of jobs on scalable computational infrastructures so that computational and data-intensive processing can be integrated into OODT’s data processing pipelines using cloud computing and high-performance computing environments.
ORC: the smallest, fastest columnar storage for Hadoop workloads.
Olingo: a Java library that implements the Open Data Protocol (OData). Apache Olingo serves client and server aspects of OData. It currently supports OData 2.0 and will also support OData 4.0. The latter is the OASIS version of the protocol: OASIS Open Data Protocol (OData) TC.
Oozie: a workflow scheduler system to manage Apache Hadoop jobs.
OpenJPA: a Java persistence project that can be used as a stand-alone POJO persistence layer or integrated into any Java EE compliant container and many other lightweight frameworks, such as Tomcat and Spring.
OpenMeetings: provides video conferencing, instant messaging, white board, collaborative document editing and other groupware tools. It uses API functions of Media Server for Remoting and Streaming Kurento.
OpenNLP: a machine learning based toolkit for the processing of natural language text.
OpenOffice: the free and open productivity suite from the Apache Software Foundation; features six personal productivity applications: a word processor (and its web-authoring component), spreadsheet, presentation graphics, drawing, equation editor, and database.
OpenWebBeans: delivers an implementation of the Contexts and Dependency injection for Java EE (CDI) 2.0 Specification (JSR-365).
OpenWhisk: an open source, distributed Serverless platform that executes functions (fx) in response to events at any scale. OpenWhisk manages the infrastructure, servers and scaling using Docker containers so you can focus on building amazing and efficient applications. The OpenWhisk platform supports a programming model in which developers write functional logic (called Actions), in any supported programming language, that can be dynamically scheduled and run in response to associated events (via Triggers) from external sources (Feeds) or from HTTP requests. The project includes a REST API-based Command Line Interface (CLI) along with other tooling to support packaging, catalog services and many popular container deployment options.
Ozone: a scalable, redundant, and distributed object store for Hadoop. Apart from scaling to billions of objects of varying sizes, Ozone can function effectively in containerized environments such as Kubernetes and YARN.
PDFBox: an open source Java tool for working with PDF documents. This project allows creation of new PDF documents, manipulation of existing documents and the ability to extract content from documents.
PLC4X: a set of libraries for communicating with industrial programmable logic controllers (PLCs) using a variety of protocols but with a shared API.
POI: Java APIs for manipulating various file formats based upon the Office Open XML standards (OOXML) and Microsoft's OLE 2 Compound Document format (OLE2). In short, you can read and write MS Excel files using Java. In addition, you can read and write MS Word and MS PowerPoint files using Java. Apache POI is your Java Excel solution (for Excel 97-2008). We have a complete API for porting other OOXML and OLE2 formats and welcome others to participate.
Parquet: a columnar storage format available to any project in the Hadoop ecosystem, regardless of the choice of data processing framework, data model or programming language.
mod_perl: mod_perl brings together the full power of the Perl programming language and the Apache HTTP server.
Petri: The Apache Petri (as in “petri dish” –where cultures are grown and bloom) committee assists external project communities interested in becoming an Apache project to learn how The Apache Software Foundation (ASF) works, its views on community, and how to build a healthy community for the long-term. Petri’s mission is to mentor existing external communities (“cultures”) about “The Apache Way” by focusing on community governance that includes discussions about ASF policies. The mentoring and education is conducted on a mailing list. The primary goal is to reach a point where a recommendation to the ASF Board can be made to construct a new Apache Project Management Committee (PMC) for the external community.
Phoenix: OLTP and operational analytics for Apache Hadoop.
Pig: a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs. The salient property of Pig programs is that their structure is amenable to substantial parallelization, which in turns enables them to handle very large data sets.
Pivot: an open-source platform for building installable Internet applications (IIAs). It combines the enhanced productivity and usability features of a modern user interface toolkit with the robustness of the Java platform.
Portable Runtime (APR): create and maintain software libraries that provide a predictable and consistent interface to underlying platform-specific implementations. The primary goal is to provide an API to which software developers may code and be assured of predictable if not identical behaviour regardless of the platform on which their software is built, relieving them of the need to code special-case conditions to work around or take advantage of platform-specific deficiencies or features.
Portals: a collaborative software development project dedicated to providing robust, full-featured, commercial-quality, and freely available Portal related software on a wide variety of platforms and programming languages. Portals offer many advantages over other software applications. First, they provide a single point of entry for employees, partners, and customers. Second, portals can access Web services transparently from any device in virtually any location. Third, portals are highly flexible; they can exist in the form of B2E intra-nets, B2B extra-nets, or B2C inter-nets. Fourth, portals can be combined to form a portal network that can span a companys entire enterprise system, allowing for access both inside and outside the firewall. Portals have many advantages, which is why they have become the de facto standard for Web application delivery. In fact, analysts have predicted that portals will become the next generation for the desktop environment.Portals distinguish themselves from other software systems because they provide the ability to integrate disparate systems and leverage the functionality provided by those systems. As such, they are not mutually exclusive, and do not force you into an either-or decision vis-a-vis existing software systems. This point is of paramount importance, particularly when you consider the fact that Web services are destined to fuel the explosion of Web applications. Since portals can access any Web services, the conclusion is inescapable: portals provide a unique opportunity to leverage the functionality of nascent technologies as well as mature, well-established software systems.
Pulsar: a cloud-native, distributed messaging and streaming platform.
Qpid: makes messaging tools that speak AMQP and support many languages and platforms.
Retainable Evaluator Execution Framework (REEF): is a library for developing portable applications for cluster resource managers such as Apache Hadoop YARN or Apache Mesos. Apache REEF drastically simplifies development of those resource managers through the following features:
- Centralized Control Flow: Apache REEF turns the chaos of a distributed application into events in a single machine, the Job Driver. Events include container allocation, Task launch, completion and failure. For failures, Apache REEF makes every effort of making the actual Exception thrown by the Task available to the Driver.
- Task runtime: Apache REEF provides a Task runtime called Evaluator. Evaluators are instantiated in every container of a REEF application. Evaluators can keep data in memory in between Tasks, which enables efficient pipelines on REEF.
- Support for multiple resource managers: Apache REEF applications are portable to any supported resource manager with minimal effort. Further, new resource managers are easy to support in REEF.
- .NET and Java API: Apache REEF is the only API to write YARN or Mesos applications in .NET. Further, a single REEF application is free to mix and match Tasks written for .NET or Java.
- Plugins: Apache REEF allows for plugins (called “Services”) to augment its feature set without adding bloat to the core. REEF includes many Services, such as a name-based communications between Tasks, MPI-inspired group communications (Broadcast, Reduce, Gather, …) and data ingress.
Ranger: a framework to enable, monitor and manage comprehensive data security across the Hadoop platform.
Ratis: a highly customizable Raft protocol implementation in Java.
River: the implementation of Jini service oriented architecture. It defines a programming model which both exploits and extends Java technology to enable the construction of secure, distributed systems consisting of federations of services and clients. Jini technology can be used to build adaptive network systems that are scalable, evolvable and flexible as typically required in dynamic computing environments.
RocketMQ: a unified messaging engine, lightweight data processing platform.
Roller: a Java-based, full-featured, multi-user and group-blog server suitable for blog sites large and small.
Royale: a productive, open-source frontend application technology that lets you code in MXML & AS3 and output to different formats.
Rya: a Big Data triple store that provides scalable storage, retrieval, and analysis of RDF data.
SINGA: focusing on distributed training of deep learning and machine learning models.
Spatial Information System (SIS): a free software, Java language library for developing geospatial applications. SIS provides data structures for geographic features and associated metadata along with methods to manipulate those data structures.
Samza: A distributed stream processing framework.
Santuario: aimed at providing implementation of the primary security standards for XML: XML-Signature Syntax and Processing and XML Encryption Syntax and Processing. (For both Java and C++)
SeaTunnel: Next-gen high-perf distributed massive data integration tool.
Serf: a high performance C-based HTTP client library built upon the Apache Portable Runtime (APR) library.
ServiceComb: Open-Source, Full-Stack Microservice Solution.With out of the box, high performance, compatible with popular ecology, multi-language support.
ServiceMix: a flexible, open-source integration container that unifies the features and functionality of Apache ActiveMQ, Camel, CXF, and Karaf into a powerful runtime platform you can use to build your own integrations solutions. It provides a complete, enterprise ready ESB exclusively powered by OSGi.
ShardingSphere: an open-source ecosystem consisted of a set of distributed database solutions, including 3 independent products, JDBC, Proxy & Sidecar (Planning). They all provide functions of data scale out, distributed transaction and distributed governance, applicable in a variety of situations such as Java isomorphism, heterogeneous language and cloud native.
Shiro: a powerful and easy-to-use Java security framework that performs authentication, authorization, cryptography, and session management. With Shiro’s easy-to-understand API, you can quickly and easily secure any application – from the smallest mobile applications to the largest web and enterprise applications.
SkyWalking: Application performance monitor tool for distributed systems, especially designed for microservices, cloud native and container-based (Docker, Kubernetes, Mesos) architectures.
Sling: a framework for RESTful web-applications based on an extensible content tree. In a nutshell, Sling maps HTTP request URLs to content resources based on the request's path, extension and selectors. Using convention over configuration, requests are processed by scripts and servlets, dynamically selected based on the current resource. This fosters meaningful URLs and resource driven request processing, while the modular nature of Sling allows for specialized server instances that include only what is needed.
Solr: highly reliable, scalable and fault tolerant, providing distributed indexing, replication and load-balanced querying, automated failover and recovery, centralized configuration and more.
SpamAssassin: anti-spam platform giving system administrators a filter to classify email and block spam (unsolicited bulk email). It uses a robust scoring framework and plug-ins to integrate a wide range of advanced heuristic and statistical analysis tests on email headers and body text including text analysis, Bayesian filtering, DNS blocklists, and collaborative filtering databases.
Spark: a unified analytics engine for large-scale data processing.
Steve: Apache's Python based voting system that the Foundation uses to handle things like voting in our new Board of Directors.
Storm: a free and open source distributed realtime computation system. Apache Storm makes it easy to reliably process unbounded streams of data, doing for realtime processing what Hadoop did for batch processing.
Streams: unifies a diverse world of digital profiles and online activities into common formats and vocabularies, and makes these datasets accessible across a variety of databases, devices, and platforms for streaming, browsing, search, sharing, and analytics use-cases.
Struts: a free, open-source, MVC framework for creating elegant, modern Java web applications. It favors convention over configuration, is extensible using a plugin architecture, and ships with plugins to support REST, AJAX and JSON.
Submarine: Cloud Native Machine Learning Platform.
Subversion: an open source version control system.
Superset: a modern data exploration and visualization platform.
Synapse: a lightweight and high-performance Enterprise Service Bus (ESB). Powered by a fast and asynchronous mediation engine, Apache Synapse provides exceptional support for XML, Web Services and REST. In addition to XML and SOAP, Apache Synapse supports several other content interchange formats, such as plain text, binary, Hessian and JSON. The wide range of transport adapters available for Synapse, enables it to communicate over many application and transport layer protocols. As of now, Apache Synapse supports HTTP/S, Mail (POP3, IMAP, SMTP), JMS, TCP, UDP, VFS, SMS, XMPP and FIX.
Syncope: an Open Source system for managing digital identities in enterprise environments, implemented in Java EE technology. Identity management (or IdM) means to manage user data on systems and applications, using the combination of business processes and IT. IdM involves considering user attributes, roles, resources and entitlements in trying to answer the following thorny question: Who has access to What, When, How, and Why?
SystemDS: A machine learning platform optimal for big data.
TVM: an open source machine learning compiler framework for CPUs, GPUs, and machine learning accelerators. It aims to enable machine learning engineers to optimize and run computations efficiently on any hardware backend.
Tapestry: A component-oriented framework for creating highly scalable web applications in Java.
Tcl: Apache-Tcl integration.
Tez: aimed at building an application framework which allows for a complex directed-acyclic-graph of tasks for processing data. It is currently built atop Apache Hadoop YARN.
Thrift: The Apache Thrift software framework, for scalable cross-language services development, combines a software stack with a code generation engine to build services that work efficiently and seamlessly between C++, Java, Python, PHP, Ruby, Erlang, Perl, Haskell, C#, Cocoa, JavaScript, Node.js, Smalltalk, OCaml and Delphi and other languages.
Tika: detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF). All of these file types can be parsed through a single interface, making Tika useful for search engine indexing, content analysis, translation, and much more.
TinkerPop: a graph computing framework for both graph databases (OLTP) and graph analytic systems (OLAP).
TomEE: an all-Apache Jakarta EE 9.1 certified application server extends Apache Tomcat that is assembled from a vanilla Apache Tomcat zip file. We start with Apache Tomcat, add our jars, and zip up the rest. The result is Tomcat plus EE features - TomEE.
Tomcat: an open source implementation of the Jakarta Servlet, Jakarta Server Pages, Jakarta Expression Language, Jakarta WebSocket, Jakarta Annotations and Jakarta Authentication specifications.
Traffic Control: allows you to build a large scale content delivery network using open source. Built around Apache Traffic Server as the caching software, Traffic Control implements all the core functions of a modern CDN.
Traffic Server: a fast, scalable and extensible HTTP/1.1 and HTTP/2 compliant caching proxy server.
Turbine: a servlet based framework that allows experienced Java developers to quickly build web applications. Turbine allows you to use personalize the web sites and to use user logins to restrict access to parts of your application.
UIMA: enables applications to be decomposed into components, for example "language identification" => "language specific segmentation" => "sentence boundary detection" => "entity detection (person/place names etc.)". Each component implements interfaces defined by the framework and provides self-describing metadata via XML descriptor files. The framework manages these components and the data flow between them. Components are written in Java or C++; the data that flows between components is designed for efficient mapping between these languages.
Unomi: (pronounced "You know me") is a Java Open Source customer data platform, a Java server designed to manage customers, leads and visitors data and help personalize customers experiences while also offering features to respect visitor privacy rules (such as GDPR).
VCL: a free & open-source cloud computing platform with the primary goal of delivering dedicated, custom compute environments to users. The compute environments can range from something as simple as a virtual machine running productivity software to a cluster of powerful physical servers running complex HPC simulations.
Velocity: a Java-based template engine. It permits anyone to use a simple yet powerful template language to reference objects defined in Java code.
Web Services: the home of a number of Web services related projects:
- Active Projects
  - Apache Axiom: An XML and SOAP object model which supports deferred parsing and on-demand building of the object tree.
  - Apache Neethi: A general framework for the programmers to use WS Policy.
  - Apache Woden: A Java class library for reading, manipulating, creating and writing WSDL documents.
  - Apache WSS4J: An implementation of the OASIS Web Services Security (WS-Security) from OASIS Web Services Security TC.
  - Apache XmlSchema: A Java class library for creating and traversing W3C XML Schema 1.0 documents.
- Archived Projects
  - Apache JaxMe: An implementation of JAXB, the specification for Java/XML binding.
  - Apache SOAP: A first generation SOAP stack.
  - Apache TCPMon: A tool to intercept SOAP/HTTP messages.
  - Apache WSIF: A simple Java API for invoking Web services, no matter how or where the services are provided.
  - Apache XML-RPC: A Java implementation of XML-RPC, a popular protocol that uses XML over HTTP to implement remote procedure calls.
Wicket: an open source, component oriented, serverside, Java web application framework.
XMLBeans: a technology for accessing XML by binding it to Java types.
XML Graphics: the conversion of XML formats to graphical output:
- Apache Batik - A toolkit for Scalable Vector Graphics (SVG), based in Java
- Apache FOP - A print formatter & renderer for XSL-FO (FO=formatting objects), based in Java
- Apache XML Graphics Commons - A library with various components used by Apache Batik and Apache FOP, written in Java
Xalan: develops and maintains libraries and programs that transform XML documents using XSLT standard stylesheets. Our subprojects use the Java and C++ programing languages to implement the XSLT libraries.
Xerces: XML parsers, Java and C++
Yetus: a collection of libraries and tools that enable contribution and release processes for software projects.
Zeppelin: Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.
ZooKeeper: a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. All of these kinds of services are used in some form or another by distributed applications. Each time they are implemented there is a lot of work that goes into fixing the bugs and race conditions that are inevitable. Because of the difficulty of implementing these kinds of services, applications initially usually skimp on them, which make them brittle in the presence of change and difficult to manage. Even when done correctly, different implementations of these services lead to management complexity when the applications are deployed.
Incubator
- AGE
- Annotator
- BlueMarlin
- brpc
- Crail
- DataLab
- Doris
- EventMesh
- Flagon
- Heron
- Hivemall
- Hop
- Liminal
- Livy
- Marvin-AI
- Teaclave
- Milagro
- MXNet
- Nemo
- NLPCraft
- NuttX
- PageSpeed
- Pegasus
- Pinot
- Pony Mail
- SDAP
- Sedona
- ShenYu
- Spot
- StreamPipes
- Training
- Toree
- InLong
- Tuweni
- Wayang
- YuniKorn
Attic
- Abdera
- ACE
- Apex
- Aurora
- Avalon
- AxKit
- Axis Sandesha2/C
- Axis Savan/C
- Axis Savan/Java
- Beehive
- Chukwa
- Click
- Open Climate Workbench
- Crimson
- Continuum
- Crunch
- Deltacloud
- DeviceMap
- DirectMemory
- DRAT
- Eagle
- ESME
- Etch
- Excalibur
- Falcon
- Forrest
- Hama
- Harmony
- HiveMind
- iBATIS
- Jakarta
- Jakarta Cactus
- Jakarta ECS
- Jakarta ORO
- Jakarta Regexp
- Jakarta Slide
- Jakarta Taglibs
- Labs
- Lens
- Lenya
- Lucy
- Marmotta
- Metron
- MRUnit
- ODE
- ObJectRelationalBridge (OJB)
- Oltu
- Onami
- Polygene
- PredictionIO
- Quetzalcoatl
- Rave
- Sentry
- Shale
- Shindig
- Standard C++ Library (STDCXX)
- Stanbol
- Stratos
- Tajo
- Tiles
- Trafodion
- Tuscany
- Twill
- Usergrid: an open-source Backend-as-a-Service ("BaaS" or "mBaaS") composed of an integrated distributed NoSQL database, application layer and client tier with SDKs for developers looking to rapidly build web and/or mobile applications. It provides elementary services (user registration & management, data storage, file storage, queues) and retrieval features (full text search, geolocation search, joins) to power common app features. It is a multi-tenant system designed for deployment to public cloud environments (such as Amazon Web Services, Rackspace, etc.) or to run on traditional server infrastructures so that anyone can run their own private BaaS deployment. For architects and back-end teams, it aims to provide a distributed, easily extendable, operationally predictable and highly scalable solution. For front-end developers, it aims to simplify the development process by enabling them to rapidly build and operate mobile and web applications without requiring backend expertise.
- VXQuery
- Whirr
- Wink
- Wookie
- WS Muse
- Xang
- Xindice
- XML

Tags: place jvm clr native library language platform reading storage presentation tool vm cloud backend

Last modified 11 October 2025

Apache.org

The Apache site is a collection of numerous open-source projects, in all stages of life (incubating, maintained, archived).