OTRS: Tool for Security Incident Reports Management

CESNET technical report number 12/2007
also available in PDF, PostScript, and XML formats.

Pavel Kácha
27.11.2007

Keywords: OTRS, CSIRT, security, incident, ticket management, issue management, metadata, template, statistics, plugin

1   Abstract

Last year, after thorough review [Kac06] we made a decision to adopt OTRS (Open source Ticket Request System) for automating and streamlining flow of incident reports of CESNET-CERTS team.

During this research, and later during OTRS evaluation period, several issues were raised, which were necessary to address and solve.

This technical report is an attempt to document decisions, important configuration changes, and our newly developed code and bugfixes, which were necessary to accept OTRS as central tool of CSIRT workflow.

2   Requirements and decisions

Let us briefly recapitulate important needs from previous review and outline reasoning and decisions for further solution.

Reliable mail communication
System must be able to actively keep messages under relevant ticket, even beyond unreliable mail threading capabilities. OTRS solves this by inserting unique identifier into mail subject (which is seldom purged), and by searching through various (configurable) message parts to find this identifier back.
Ability to split and merge individual reports
Supported by OTRS as is.
Searchable metadata

OTRS supports arbitrary key/data pairs, attached to each ticket, it is up to us to be able to fill them with meaningful content.

We have to develop a plugin, or some other piece of code, which will analyse incoming mail, fetch reasonably looking IP addresses, pairs them with network block name and responsible administrator and attaches them as ticket metadata.

Unambiguous ownership of manager
Supported as is.
Templates for routine answers/forwards, possibly dynamically modifiable

OTRS supports dynamic templates for replying, but not for forwarding. As the handover of incident reports to corresponding administrator is the key part of CSIRT work, this is a major drawback.

This is clearly feature necessary to implement.

Ability to handle and produce signed/encrypted messages
Both PGP and S/MIME messages signing and verification is supported. Encryption and decryption is not reliably supported in current version, but work seems to be underway. As this is not necessary feature (important data can be transferred by other means, or encapsulated in a PGP armoured blob as an attachment), we will wait and watch the upstream for now.

During the evaluation phase, and later after the real deployment, the following additional issues arose:

How to exactly represent particular parts of workflow?
We have to think out a structure of queues, states and metadata to match incident report flows and classification, in balance with usability and apprehensiveness.
More distinct visual disambiguation of tickets
We would like to be more clearly informed of the ticket state. OTRS does not readily support state-based determination in its templates, but with some touches in the code it should be possible to achieve.
Timing out of tickets based on several criteria
Tickets, which do not receive reply in time, or are not treated in time, should raise in importance. OTRS supports so called "ticket escalation", but OTRS (only) way of escalating essentially means locking access to other tickets in the queue until the escalated ticket is dealt with. This is not desirable, we would like more relaxed means of alerts - visual change, mail, etc. Also, time based actions in OTRS can depend only on the time of creation, not of the last change. We have to develop another approach.
Spam
We could of course insert one of classical antispam solutions into the incoming mail pipeline, however as large volume of reports involves spam incidents, risk of false positives raises noticeably. As ceasing to use antispam measures renders today's mail practically unusable, we have to implement some solution.
Statistics
CESNET-CERTS operates among others as mediator between complainant and responsible destination. It is in our interest to monitor end institutions response and draw consequences. To accomplish this, we have to monitor numbers of solved/unsolved incidents, possibly by rough type, by institution. Need for other types of statistics may arise in the future.

These points made up a springboard and borders for configuration and development, be it addon or behaviour changes. In the rest of the report we will try to document specific changes made, focusing mainly on things directly corresponding with these needs, and some non-obvious configuration.

We started with version 2.0.3 and during testing, deployment and usage we gradually raised up to 2.1.2. This is also version, to which we refer in this document.

Some exact configuration values in this text are intentionally changed to protect private details, but original values would be of no use outside of our deployment anyway.

3   Configuration

This section addresses OTRS configuration. We will not dissect software installation and basic features will be mentioned only briefly and sparsely, kind reader is highly advised to consult Administrator manual [ADM]. We will concentrate on features worth mentioning in the security team management context. Unless specified otherwise, configuration keys correspond with keys, settable in Kernel/Config.pm in installation directory, or in SysConfig tab in Admin zone of web interface.

3.1   Basic

As it depends on local conditions, we will not describe setting of webserver, paths and logging.

Worth mentioning would be that OTRS has wide offer of authentication methods and modules, so it has no problem of cooperating with the CESNET CAAS (authentication against LDAP directory server):

$Self->{'AuthModule'} = 'Kernel::System::Auth::LDAP';
$Self->{'AuthModule::LDAP::Host'} = 'ldaps://ldap2.cesnet.cz';
$Self->{'AuthModule::LDAP::BaseDN'} = 'dc=cesnet,dc=cz';
$Self->{'AuthModule::LDAP::UID'} = 'uid';

We had to consider encoding problems - incident report mails can arrive from whole world, so the only option able to encompass widest character variety is Unicode, supported by OTRS in its UTF-8 encoding.

$Self->{'DefaultCharset'} = 'utf-8';

Also small inconsistency must be dealt with when setting base directory paths different from vendor expected - when using mod_perl, OTRS provides startup script in scripts/apache2-perl-startup.pl, which does not respect the main configuration file settings and paths must be edited in this script directly.

3.2   Useful

Sometimes mail does not reach its destination. Wrong things happen and mail delivery report is generated and arrives back into OTRS. These reports usually do not retain its original subject, but often contain at least remnants of the original message attached or inline. If OTRS does not find ticket identification number in the subject, it is able to search whole parts of the message. Enabling the following options is thus desirable:

$Self->{'PostmasterFollowUpSearchInReferences'} = 1;
$Self->{'PostmasterFollowUpSearchInBody'} = 1;
$Self->{'PostmasterFollowUpSearchInAttachment'} = 1;
$Self->{'PostmasterFollowUpSearchInRaw'} = 1;

On the contrary, OTRS by default tries to check functionality of the destination MX server. However, unreachable destination server does not necessarily mean prolonged outage. It may be just momentary lapse, and we would like to let the message stay in the queue and try again. It is advisable to turn off this functionality with

$Self->{'CheckMXRecord'} = 0;

3.3   Mail authenticity

OTRS supports both PGP and S/MIME formats for signing of mail messages. A drawback and advantage at once is that the signing key usage can be limited only per queue, not per sender. In our case this is a desirable behaviour, but that doesn't mean it is feasible for all CSIRT key management schemes.

Web based mailing also brings inevitable security weakness of having unencrypted signing key (or encrypted key with a passphrase) directly on the server - administrator of the OTRS machine has then full access to this key. We (as a CSIRT team) must be ready to revoke our global key and deal with consequences in case of personal changes.

Necessary keys had to be set and attached to particular queues in Admin area. Also note that the keys must not be encrypted, or password for encrypted keys must be present in the configuration file.

3.4   Mail architecture

Our mail setup accepts mail for certs@, abuse@ and postmasters@ addresses in main CESNET domains.

[Figure]

Figure 1: Mail delivery architecture

OTRS is able to accept mail by piping it to its auxiliary bin/PostMaster.pl script, or by POP3 polling. We have used the former method, basically because of its flexibility. During the initial deployment, mail was dispatched by Postfix directly into this script through alias file, later it was called by the IP harvesting script, and yet later on everything got wrapped in the Maildrop mail delivery agent with more granular rules.

Incoming mail is accepted and processed by usual Postfix setup, with round-about through Amavis with ClamAV and SpamAssassin set up. SpamAssassin setup is worth mentioning - we do not want to accept spam, but we want to get incident reports, which itself can contain spam. We have created a handcrafted whitelist of phrases, which usually do occur in incident reports only. This whitelist is actively maintained and evolves as time goes. Amount of false positives stays low, and volume of real spam is acceptable.

Also, a backup mailbox, where all incoming and outgoing mail is copied in real time, is set up. For incoming mail by means of usual alias record, for outgoing mail the feature of OTRS duplicating all outgoing mail is used:

$Self->{'SendmailBcc'} = 'backup-mailbox@example.cz';

Usefulness of Maildrop shows up in connection with OTRS special headers handling. OTRS understands a definite set of mail headers, whose content can modify its behaviour - choose the particular queue or add some metadata. OTRS itself has a way to classify and define specific actions on mails, but this support is limited, so prepending of the real delivery agent was a natural choice.

From the Maildrop script, messages are piped through the IP address harvesting script, consulted later.

3.5   States/Queues

Once an incident report is received, the CESNET Monitoring centre checks its importance, and simpler incidents solves on its own, otherwise it hands it on to CESNET-CERTS main members. Incident relevancy is assessed and if necessary, additional information is requested. Next, reports are categorised according to the networks affected and resent to their respective administrators, after consulting whois information. The administrator responsible then communicates directly with the original complainant (if needed) and finds a solution. If everything goes fine, from this point onwards, CSIRT acts only as a spectator and a recorder. According to the importance of the report, the relevant administrator may be contacted and response requested where CSIRT has not been informed about the resolution in time. Afterwards, the report is then finalised and marked with the appropriate outcome - solved, no solution needed, no response from administrator and so on.

We used Certs queue as incoming for incident reports, most of the basic incidents are tackled by Monitoring members directly there. When they find out that the report needs additional care, they move it into the Certs-Masters queue.

Basic ticket lifetime is represented by New (newly arrived), Open (open, but in progress), Closed successful (solved), Closed unsuccessful (unsolved or not dealt with any more by whatever reason).

For more granular ticket lifetime representation we created the following additional states (we used Czech names, so this is rough translation):

Update
Additional info arrived and was assigned to the ticket. This is achieved with PostmasterFollowUpState directive. Operator is then in connection with our own later discussed state colourisation patch able to better spot new followups to open tickets.
Timeout
Ticket passed some time without any reply or update. Tickets enter this state by means of our plugin. The operator is again clearly informed, that he should review this ticket and possibly escalate it.

We also defined some states for closed tickets classification, to be able to deduce rough reason of closing - not only successful and unsuccessful, but also missing additional information, informational only, operational and several others of internal meaning.

By this taxonomy, also spam and misdirected delivery reports (unsolicited bounces) should be represented by state, but as the state change is a costly operation from the usability standpoint (needs more acts by the operator), and as the queue change is simpler, we created separate queues for them. This approach allows the Monitoring operators to quickly sieve slip-through spam and bounces and concentrate on real incidents reports.

4   Development

Here we will present specific solutions and development, which were necessary to achieve goals, outlined in the opening part of this report.

4.1   IP harvesting

OTRS is able to store key/data pairs along with the data. These pairs can be arbitrary, but key names can be specified and defined unchangeable. As we plan to attach at least an IP address, its network pertinence (according to RIPE database), and a responsible administrator contact, deduced from network block information, we have rigidly defined these keys as the first three metadata values, under the names NETNAME, IP, ADMIN.

These fields are editable so human operator can spot and correct possible errors. However, data should be prefilled in some way, to ease a burden of hunting them down and filling them up by hand.

We considered various schemas of automatic mail analysis, and after some testing we finally came up with a simplistic approach.

Overwhelming majority of incidents contains only one IP address from AS2852 (CESNET IP ranges). Our analyser breaks mail into its MIME subparts and searches in subject, main body and all attached data recursively for anything conforming to an IP address format. This can result in large number of addresses, which are of no connection with CESNET networks, we thus filter out only those belonging into the CESNET network space and remove duplicities. This usually yields only one IP address, when the result contains more addresses, we leave the decision for later on human operator. Only human with respective context from mail message can conclude whether incident report concerns more IP addresses (and should be subject of separation to two tickets) or whether the second address is bogus.

Obtained addresses are then screened through RIPE database (thanks to Pavel Vachek's scripts from IDS project [Vac06]) and accompanied with RIPE netname and responsible administrator contact. Resulting info is inserted into mail headers in a form, understandable by OTRS, which extracts data and assigns it to the respective metadata fields. This is an example of generated headers:

X-Otrs-TicketKey1: NETNAME
X-Otrs-TicketValue1: CESNET-BB4
X-Otrs-TicketKey2: IP
X-Otrs-TicketValue2: 195.113.144.199
X-Otrs-TicketKey3: ADMIN
X-Otrs-TicketValue3: abuse@cesnet.cz
[Figure]

Figure 2: Address fetched from mail to metadata

The script is available online.

4.2   Response templates for forward

As we need to pass on the incident reports to responsible administrators of concerned networks or machines, the absence of the templates for forwarding is limiting. We have thus rewritten practically the whole part of the forwarding code, which is now capable of using globally defined system templates. We had to modify the user interface code, and related page templates to present an additional functionality to users.

The code works fine, despite some of its deficiencies. OTRS does not differentiate template types, hence we use system-wide templates, which may be confusing for users, because the same set of templates is available for reply and also for forward. For now we distinguish them by their name, but we plan to implement a more deep split. We have yet to carefully analyse the situation, because this would need more invasive changes into data model of OTRS, which could seriously raise maintenance and upgrade complexity. We have also attempted some steps to propagate our code into upstream.

The patch is available online.

4.3   Automatic recipient

Forwarding action is mostly used for passing on incident reports. Information about the target email address is already available in metadata from the IP harvesting script, so using it seems obvious. We modified the forwarding code to readily use supported information, if available. Passing of incidents is now possible with 4 clicks, without need to fill in something by hand.

The patch is a part of the previous code.

4.4   Statistics

To be able to compare the incident solving hit rate of our members and constituency, we had to consider retrieval of some statistics data. OTRS has some basic statistical module, however its functionality is limited to basic time/state/queue based counts. As the basic data model of OTRS is nicely transparent, fetching more complex data is straightforward use of a conveniently crafted SQL query, so we again used our own script, with subsequent processing of results and formatting them into visually and informationally acceptable form. We were also able to add some data from other sources (annotate institutions with their whole names, instead of RIPE shortcuts) or more visually convenient way.

[Figure]

Figure 3: Generated statistics example

The script is available online.

4.5   CSS and states

Default OTRS interface is well arranged, but somewhat plain, with no stronger visual emphasis on various aspects of tickets. We have modified default templates to be able to use the states as CSS identifiers for a ticket header, so the change of the header style consists of just a definition in CSS file (Kernel/Output/HTML/Standard/css.dtl). We have chosen to keep just with colourisation, but that remains enough to clearly separate and highlight new, updated and timed out tickets from the rest.

The patch is available online online.

4.6   Raise state of wearily handled tickets

Often the incident report is handed over, and then stays in queue, waiting for an answer. As soon as the answer arrives, the ticket is highlighted and the operator knows he can continue in handling. When no answer comes in some time, we wanted the tickets to be highlighted in another way, so the operator does not need to repeatedly go through all the tickets and check their time and state.

OTRS supports regular checking of tickets for some conditions and changing them accordingly, but the time can be checked only in relation to the ticket creation, not its update. However the time of the last update is internally stored by OTRS. We have thus created an auxiliary script (runing as one of OTRS's cron scripts), which goes through open tickets, checks the time of the last update, and tickets, exceeding some timeframe (in our case 3 days), change state. In cooperation with styling by the state patch this appears as a colour raise of the ticket in user interface. Timed out tickets are thus not rotting in queue until somebody accidentally spots them.

While developing the script, we had to step aside from the usual OTRS ways, and combine direct access to the database with the internal object model. We execute usual SQL statements over the relational repository, which gives us a list of affected ticket identifiers. We then use this list to instantiate real OTRS ticket objects, and use their methods for a full featured manipulation. This ensures all auxiliary structures are updated accordingly, along with history messages.

The patch is available online.

4.7   Check of Monitoring Time Limits

As time is important, we needed first tier operators (Monitoring centre) to handle and sieve reports in a tight timeframe. We stated hard limits for their operation, and based on the previous time checking script, we created another version, checking the last change of the tickets in the Certs queue, and sending email notifications if limits are exceeded.

As a result of this script operation, it is worth mentioning here that for complying with time limits, Monitoring work is exceptional.

The patch is available online.

4.8   Check of Excessive Amounts of Reports from One IP Address

We consider some incident reports solely informational. However, higher number of common incidents reports on one particular IP address from various sources may foreshadow a more serious problem going on, so severity of incident should be reevaluated by human operator.

Again, based on previous work and principles, we created a script, checking unusual number of incidents from one IP address and sending email notifications if suspicious number is exceeded.

Results usually correlate with data from CESNET IDS system.

The patch is available online.

4.9   Bugfix - Coping with Certain Attachment Encodings

OTRS uses hardwired Base64 for all attachments. However, the underlying Perl MIME::Parser module is not able to cope with it for all types of attachments, for example message/rfc822 (attached another mail message), which yields error notice "can't have encoding base64 for message type message/delivery-status". As attaching another message is a common situation, for example while resending delivery report messages, we modified OTRS mailing routines to employ MIME:Parser autodetection, which can fallback into 8-bit encoding when necessary.

The patch is available online.

4.10   Bugfix - Unexpected Anonymous Bind to LDAP

After one of upgrades we observed an unexpected behaviour - OTRS started to baulk at LDAP authentication. After closer inspection the code stopped to take into consideration possible anonymous binds for authentication, the problem was in the change of semantics of the underlying Perl module. We fixed this behaviour, and so did (independently) the authors in later versions.

The patch is available online.

5   Future

While working on OTRS modifications, another major version was released. One of the near time aims is to review its changes and features, merge our modifications and upgrade running instance. After some cleanup we would like to offer some of our changes upstream, mainly forwarding templates, CSS/states and attachment encodings bugfix. Some steps in this direction have been already made, although with no clear outcome yet.

We would like to solve yet more minor quirks - propagate forwarding as a real first class citizen (for example disambiguating "Re:", "Fwd:" in subject), allow for more direct state change (now only possible during ticket update, would be useful in some specific situations).

One of the issues to evaluate are federations. Federated login would fall naturally into the CESNET federative infrastructure, but it bears some security consequences, which will have to be considered carefully in sight of CSIRT policies.

We also keep an eye on IODEF and IDMEF incidents and intrusion description formats. Our simplistic approach works fine, and is likely to stay at least as a fallback, but in case of the standard exchange formats proliferation more precise target identification (and standard form of further distribution) may be feasible. Also IPv6 as growing security target must be considered.

6   Conclusion

The CESNET OTRS ticketing system installation currently (after nearly a year of service) holds around 2000 tickets, not counting spam and unsolicited bounces. The OTRS interface is used by three core team members, as well as seven Monitoring centre operators, to manage incident reports for several hundreds of assigned network blocks.

According to the configuration and development experiences and observations from users, the ticketing systems review conducted last year [Kac06] pays off, and the course set by its results, aside from some minor deficiencies, works well.

References

[Kac06] Kácha P. OTRS: Issue Management System Meets Workflow of Security Team. Technical report 7/2006, Praha: CESNET, 2006.
[ADM] Kammermeyer R. et al. OTRS 2.1 - Admin Manual.. online.
[DEV] Schöpplein C. et al. OTRS 2.1 - Developer Manual.. online.
[Vac06] Vachek P. CESNET Intrusion Detection System. Technical report 5/2006, Praha: CESNET, 2006.
další weby:fond rozvojemetacentrumCzechLightpřenosyvideoservereduroameduID.cz