Statistical Machine Learning for Medicinal Plant Identification

Algorithm for leaves classification

MEDIPI (MEDIicinal Plant Identification) algorithm

Medicinal plants are usually identified by practitioners based on years of experience through sensory or olfactory senses. The other method of recognizing these plants involves Laboratory- based testing, which requires trained skills, data interpretation which is costly and time-intensive. Automatic ways to identify medicinal plants are useful especially for those that are lacking experience in herbal recognition. We introduce a computationally efficient new algorithm for medicinal plant classification. We refer to our medicinal plant classification algorithm as MEDIPI : MEDIicinal Plant Identification. MEDIPI contains two main phases (i) The offline phase, and (ii) The online phase. The algorithm operates on a set of interpretable features computed from the leaf images. The offline phase of the algorithm contains 4 main steps: i) image processing, ii) feature extraction, iii) train a algorithm. Figure 1 provides an overview of our algorithm. Figure 2 provides a selected set of features we use in the algorithm. The image processing steps are shown in Figure 3.

Figure 1: Overview of MEDIPI algorithm

Figure 2: Leaf image features

Figure 3: Image processing workflow

Benchmark dataset for plant leaves classification

Researchers usually struggle and spend a lot of time establishing a database by gathering many leaf samples as raw data. By sharing our database we provide a training/test database to other researchers to develop new algorithms or to evaluate their algorithms. Furthermore, data sharing encourages more connections and collaboration between scientists, which leads to better decision-making.

R Software Package MedLEA: Medicinal LEAf

The MedLEA package provides two datasets.

i) A dataset of morphological and structural features of 471 medicinal plant leaves. The features of each species are recorded by manually viewing the medicinal plant repository available at ( For more information visit at

Figure 4: Some morphological characteristics of profiles

ii) Leaf image data set: A database of leaf images of medicinal plants in Sri Lanka is not yet available. Hence, through this research, we establish a repository of medicinal plant images. This repository contains 1099 leaf images of 31 species. There are 29-45 images per species.

Figure 5: Few samples from the MedLEA images

You can get access to the data set via the MedLEA package.


Research outputs


Lakshika, J. P., & Talagala, T. S. (2021). Computer-aided Interpretable Features for Leaf Image Classification. arXiv preprint arXiv:2106.08077.


Jayani P. G. Lakshika and Thiyanga S. Talagala (2021). MedLEA: Morphological and Structural Features of Medicinal Leaves. R package version 1.0.1.

CRAN\_Status\_Badge Downloads

Conference talks

Click here

Research reproducibility

R codes and data to reproduce the results in the paper “Computer-aided Interpretable Features for Leaf Image Classification” available at

This work is a part of Jayani P G Lakshika BSc (Hons) Degree in Statistics thesis - 2021, which was supervised by me.