Blogs

Alfresco Search for PDF Images using Transformations and Tesseract OCR

Alfresco Search for PDF Images using Transformations and Tesseract OCR

One of the great things about working in the Open Source space is that you sometimes get to work with NGOs such as Liberty Asia. Established in 2011, Liberty Asia is made up of a group of dedicated professionals from different industries who feel strongly that a more effective, coordinated response to slavery is essential and that leveraging technology available to the corporate sector and providing it to the NGO sector will facilitate this response. As part of this Liberty Asia is providing a dedicated Collaboration platform to NGOs that fight against human trafficking in Asia. The platform has to allow investigators to search for evidence embedded in scanned PDF image documents. Alfresco does not support this so Seed developed a solution which uses the Tesseract OCR engine in conjunction with Alfresco transformation to provide a solution to this requirement.

Delete Alfresco Auditing Data

Delete Alfresco Auditing Data

Since we described the creation of a custom scheduled job that queries the alfresco audit service and reports on access for each share site in our previous blog, we thought it would be a good idea to share how to clear out the accumulated audit data after a predefined period of time in this blog.

Creating Site Access Reports from Audit

Creating Site Access Reports from Audit

In our previous blog we described how you enable auditing, create a custom Audit Application, create a custom data extractor and also how to check the custom audit application is working. In this blog we will describe the creation of a custom scheduled job that queries the audit service and reports on access for each share site.

Alfresco auditing and Site access reporting

Alfresco auditing and Site access reporting

This blog discusses an approach to reporting on site access and activity. We previously looked at using the AuditShare add-on, of which we have been quite impressed. However we found that it did not perform well in an environment with thousands of users. The solution we determined best matched the Alfresco environment for our customer includes the following aspects each of which will be discussed in this and also our next 2 blogs:

Alfresco Auditing was used to trap any access to content in a custom Audit Application.
A scheduled job was created to query the audit and report on access for each site.
The report was created monthly after which the audit entries were removed.

Alfresco Advanced Search forms and user selected Metadata values

Alfresco Advanced Search Forms And User Selected Metadata Values
This blog is a reflection on Seed IMs experience with configuring advanced search for custom metadata so that only the metadata items for which a user has selected values will be included in the search.
The Alfresco Forms Framework allows you to configure search forms to search for content based on metadata fields. This blog discusses considerations for metadata fields that use lists of items and also how to ensure that the search only includes a metadata field in the criteria if a user has entered or selected a value.

Alfresco List Constraints and repeating metadata fields with Commas

Alfresco List Constraints And Repeating Metadata Fields With Commas
This blog is a reflection on Seed IMs experience when configuring a repeating (multiple) metadata field with a list of values that contain commas.
Alfresco uses commas to signify a new value when a user enters fields using Share. Therefore if you provide a dropdown which contains, for example, a value of 2,4-D, when you save the value, alfresco assumes that you are setting two values, 2 and 4-D into the multivalued metadata field. Unfortunately, the values are not valid from a constraint perspective so you cannot save them to the content item.

Public Access to documents in a Share site

Public Access to documents in a Share site
During the course of one of our recent project, it was required to provide public access to certain piece of content found in a Share site. Alfresco has the concept of Shared content which provides a preview of a piece of content. However it is not possible to access the content file itself.
This blog explains a mechanism for providing a document link against a piece of content so that it can be accessed publicly without need to login to Alfresco.

Alfresco and AutoCAD integration

Alfresco and AutoCAD integration
In the recent weeks, we have received a few queries about the possibility of integrating CAD (Computer Aided Design) software with Alfresco. So we thought it is probably a good time to share our experience on an Alfresco and AutoCAD integration solution provided by Formtek.
AutoCAD has been at the forefront of the engineering drawing world for decades now. AutoCAD content is widely available everywhere and whenever there is engineering drawings involved, it is likely the company is using AutoCAD to produce their drawings. Therefore, it is important that any ECM solution you consider can support AutoCAD drawings.
Formtek, an organisation specialised in providing content management solutions to mainly engineering, aerospace and manufacturing industry, created an Alfresco-AutoCAD integration solution called Engineering Data Management (EDM).

Working with Modules in Alfresco

Working with Modules in Alfresco
Recently we came across a situation whereby we needed to uninstall a previously installed module for one of customers but we did not have a window to restart alfresco. So we thought why not write a simple blog about how we can uninstall a module without restarting alfresco using the Module Management Tool (MMT) and also explore the other commands of MMT.

Alfresco Disaster Recovery

Alfresco Disaster Recovery
Our previous blog (Alfresco Repository Clustering) gave you an insight into using alfresco clustering as a means of having an alfresco system with high availability and performance through clustering.
In this blog we will describe two patterns for HA / Disaster Recovery (DR) if you do not have the option of clustering or, if you need a DR instance in a different geographical location.
(Note: These methods can also be used for additional levels of DR in conjunction with the clustered architecture from the previous blog.)
Disaster Recovery involves pushing your repository data to a separate location that can be used in the event of a loss of the primary production data. The SLA for data recovery in DR will determine how you backup data to the DR environment. The following two options will be covered in this blog:
Delayed Recovery: Scheduled backups with loss of up to one day of data.
Real-time Recovery: Continuous backup with minimum loss of data.
In both methods the DR Alfresco server software is not running to prevent simultaneous updates of the repository by the DR server.

Pages