Statistical Machine Learning for Medicinal Plant Identification

Benchmark dataset for plant leaves classification

Researchers usually struggle and spend a lot of time establishing a database by gathering many leaf samples as raw data. By sharing our database we provide a training/test database to other researchers to develop new algorithms or to evaluate their algorithms. Furthermore, data sharing encourages more connections and collaboration between scientists, which leads to better decision-making.

R Software Package MedLEA: Medicinal LEAf

The MedLEA package provides two datasets.

i) A dataset of morphological and structural features of 471 medicinal plant leaves. The features of each species are recorded by manually viewing the medicinal plant repository available at ( For more information visit at

Figure 1: Some morphological characteristics of profiles

ii) Leaf image data set: A database of leaf images of medicinal plants in Sri Lanka is not yet available. Hence, through this research, we establish a repository of medicinal plant images. This repository contains 1099 leaf images of 31 species. There are 29-45 images per species.

Figure 2: Few samples from the MedLEA images

You can get access to the data set via the MedLEA package.


Algorithm for leaves classification

MEDIPI (MEDIicinal Plant Identification) algorithm

Medicinal plants are usually identified by practitioners based on years of experience through sensory or olfactory senses. The other method of recognizing these plants involves Laboratory- based testing, which requires trained skills, data interpretation which is costly and time-intensive. Automatic ways to identify medicinal plants are useful especially for those that are lacking experience in herbal recognition. We introduce a computationally efficient new algorithm for medicinal plant classification. We refer to our medicinal plant classification algorithm as MEDIPI : MEDIicinal Plant Identification. MEDIPI contains two main phases (i) The offline phase, and (ii) The online phase. The algorithm operates on a set of interpretable features computed from the images. Figure 3 provides an overview of our algorithm.

Figure 3: Overview of MEDIPI algorithm

This work is a part of Jayani P G Lakshika BSc (Hons) Degree in Statistics thesis - 2021, which was supervised by me.

Thiyanga S. Talagala
PhD in Statistics