Category: Blog

LinuxJournalRipper

TL;DR : These scripts are made for Mac or Linux. They can download every old issues of the Linux Journal into PDF. To download old issues locally just run a command like ./LJ-ripper-....sh.
One can freely use my scripts under BSD-2 clause licence.

(c) eliotlencelot, 2019.

What is LinuxJournalRipper

It is a script available for Mac and Linux that download the 131 first issues of the Linux Journal in PDF.

Why?

On August 7, 2019 Linux Journal shut its doors for good. The website is now mostly open for everyone, they have removed the paywall (as of 2019-08) for the HTML version of Linux Journal, but it is still paywalled for PDF/ePub/Mobi version.
We, the people, fear that the website will be also shut down in a near future, as it represents a cost for a defunct company. Hence, this is an emergency script make to download the 131 first issues of Linux Journal in PDF !

The generated PDF are portable, lightweight and without any advertisment (contrary to a scanned file from that era).

What does this script do?

This (emergency) script :

goes on the Linux Journal main archive page https://secure2.linuxjournal.com/ljarchive/LJ/tocindex.html
download the list of Linux Journal issues.
goes on each issue sub-website and download the list of articles of this issue.
download each article.
create a unique PDF per issue.
put the whole thing in a “Linux Journal” folder with logs in a “Logbooks” folder and cleanly quit.

Which systems can run this script?

This (emergency) script run in :

Apple macOS : Intel Mac, see the Mac folder for more information.
FreeBSD : x86 or AMD64, see the Mac folder for more information.
GNU/Linux : x86, see the Linux 32 bits folder for more information ; AMD64 see the Linux 64 bits folder for more information ; other architechture as long as the dependecies exist ;
Microsoft Windows : Should work with Cygwin or WSL as long as the dependecies exist.

How much time did it took?

Took half a night on an old Intel Core 2 Duo processor with a gigabit connexion.

License of my script

My script are BSD-2:

My script use BSD-2 “Simplified” license, see the LICENSE file.

Dependencies licences:

GNU Core Utilities, lynx, WeasyPrint and if needed ImageMagick are all free and open-source tools. I did not modify their source code, I just call them. Their license is respectively GPLv3, GPLv2, BSD-3 “Revised” and Apache. See their website for more information.
CPDF is open-source but not FLOSS : code is accessible online, there are no fees for personal use, but there are fees for commercial use. License for CPDF is available here. It does also add a watermark.

It is possible to replace the CPDF by ImageMagick a project under Apache license, but the quality will be lessened, by either :

replacing the ../cpdf *.pdf -o "Linux Journal - $INDEX.pdf"command by convert *.pdf "Linux Journal - $INDEX.pdf" in the LJ-ripper-[sth].shscript for your plateform
or on Linux, by using the script from Linux others.

How to dump every issue of the Linux Journal :

Everything in the better quality possible but with ads

Download issues from n°301 (2019-08) to n°132 (2005-04) in official publication PDF here : https://drive.google.com/open?id=1FuU1N7tGNb-gDfrs5In_sqyPCwZ6FE2p (2274 MB if only n°301 to n°132, 3344 MB else)
Run the script to get n°131 (2005-03) to n°1 (1994-04) in generated PDF. (287 MB)

Everything in good quality with the script

Other issues (>= 132) are available online in much better quality, into the grey part of the internet (see previous subsection), hence they have not disappear for now. The script is still able do download them, you could modify this script to download everything from 1994 up to August 2019, it will be long, but it’s possible, just edit the source code.
To do so, you must suppress these lines of codes:

#Suppress numerous line until we have only from issue 1 to 131
#We have PDF in better quality for issues >=132.
tail -n +171 url_of_issues.txt > tmp.txt
mv tmp.txt url_of_issues.txt

Installation and Usage

Main steps

This script should run well under macOS, FreeBSD and GNU/Linux.

Install the dependencies by copy/pasting the following snippet of code into a Terminal.
Download the whole project or just the folder associated to your OS.
Run in a shell as usual : ./LJ-ripper-[sth].sh (replace [sth] by the right word).

1) Dependencies

TODO : Change imagemagick to the new cpdf

Installing dependencies means ensure you have everything needeed for the program to run.

macOS

Open the Terminal
Run this script in a Terminal

if test ! $(which brew)
then
	/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
fi
brew update

brew install coreutils
brew install python3 cairo pango gdk-pixbuf libffi lynx
pip3 install WeasyPrint

brew update
brew upgrade python3 cairo pango gdk-pixbuf libffi lynx

echo "Everything should be installed."

GNU/Linux

Open a Terminal.
Run the corresponding script (here after) in a Terminal.

Debian based (Debian 9.0 Stretch or newer, Ubuntu 16.04 Xenial or newer) :

apt-get update
apt-get install lynx
sudo apt-get install imagemagick
sudo apt-get install build-essential python3-dev python3-pip python3-setuptools python3-wheel python3-cffi libcairo2 libpango-1.0-0 libpangocairo-1.0-0 libgdk-pixbuf2.0-0 libffi-dev shared-mime-info
pip3 install WeasyPrint

Fedora :

sudo yum install redhat-rpm-config python-devel python-pip python-setuptools python-wheel python-cffi libffi-devel cairo pango gdk-pixbuf2 lynx
yum install ImageMagick
pip3 install WeasyPrint

ArchLinux :

sudo pacman -S python-pip python-setuptools python-wheel cairo pango gdk-pixbuf2 libffi pkg-config lynx
sudo pacman -S imagemagick
pip3 install WeasyPrint

Gentoo :

emerge pip setuptools wheel cairo pango gdk-pixbuf cffi lynx imagemagick
pip3 install WeasyPrint

Windows 10

It may be possible to launch the Linux others script in Linux on Windows 10.
You need to have Windows 10, on a 64 bit processor, build 16215 or later.

To find your PC’s architecture and Windows build number, open Settings > System > About Look for the OS Build and System Type fields.
Follow this Windows Subsystem for Linux Installation Guide for Windows 10
Do not forget to initializing the newly installed distro
Follow 2) and 3) normally. For 2) chooseLJ-ripper-OtherArch.sh in the Linux others folder.

Troubleshooting Windows Subsystem for Linux

2) Download the project

You could do a git clone on your computer.
You could simply push the Clone or download button in the internet interface of GitHub button to get a zip with all the project, including the Mac OS X and Linux folder.

3) Usage

Open a Terminal and run the script with somting like : ./LJ-ripper-[sth].sh (replace [sth] by the right word).

The script will run as soon as it is launched without asking you anything at anytime. TODO : Change this behavior Please run this on a directory with enough free space (1.5 GB, I think).
Numerous temporary files will be written (> 2700 temporary files).

Licence

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Malware-Detector-Repeat

Abstract

Information security has become ubiquitous in this era. in this project we try to demonstrate a simple anti-malware prototype consisting of a system mointer that mointers the system and warns the user in case of any problems like fork bomb or memory bad behavior, quarantine, kills, and removes the malware.

System Components

System Monitor

The main component of the system. presents the user with a summary of system metrics, then ask the user if he wants more info about:

CPU
RAM
Disk
Network
fresh new summary

or do a scan or exit. while the program is running a thread running in the background notifies the user about any warnings or potential threats to the system and runs a scan automatically in these cases.

Processes Scanner

it is a Python script that detects and kills the fork bomb malware which overloads the OS and makes it out of control.

Memory Scanner

memory eater is a program that allocates and deallocates memory in the heap by a variable size simulating memory-based or fileless malware These types of malware exploit vulnerabilities in memory management to carry out malicious activities without relying heavily on files stored on disk, so this scanner can detect this bad program and finally kills or stops this process.

Getting started

VM 🖥️

The development environment is Ubuntu Linux VM and can be extended to other environments. The script is written in Python 3

Download the files
Follow the installation steps:

sudo apt-get update
sudo apt-get install -y python3-pip
pip install psutil

We used the C programming language to build the malicious program, and GCC for the compilation

sudo apt-get install gcc
gcc memEat.c -o memEat # compile the memory malware
gcc bomb.c -o bomb # compile the fork bomb

run

python main.py # run the antivirus

open another terminal and run the malware you want to experiment with

Docker 🐋

docker build . -t malware-test # build the image
docker run -it malawre-test # run in interactive mode

the system monitor will appear, then in another terminal

docker exec -it <container-name> bash

from there you can run the malware and interact with the detector and experiment

Sample Output – VM

Folder Structure

Refer to the following table for information about important directories and files in this repository.

Malware-Detector-Repeat
├── screenshots         sample output
├── README.md           main documentation.
├── SysMonitor.py       reads and shows system info
├── Scan.py             the scanner: scan thes system for vulnerabilities
└── main.py             driver code

Malware-Detector-Repeat

Abstract

Information security has become ubiquitous in this era. in this project we try to demonstrate a simple anti-malware prototype consisting of a system mointer that mointers the system and warns the user in case of any problems like fork bomb or memory bad behavior, quarantine, kills, and removes the malware.

System Components

System Monitor

The main component of the system. presents the user with a summary of system metrics, then ask the user if he wants more info about:

CPU
RAM
Disk
Network
fresh new summary

or do a scan or exit. while the program is running a thread running in the background notifies the user about any warnings or potential threats to the system and runs a scan automatically in these cases.

Processes Scanner

it is a Python script that detects and kills the fork bomb malware which overloads the OS and makes it out of control.

Memory Scanner

memory eater is a program that allocates and deallocates memory in the heap by a variable size simulating memory-based or fileless malware These types of malware exploit vulnerabilities in memory management to carry out malicious activities without relying heavily on files stored on disk, so this scanner can detect this bad program and finally kills or stops this process.

Getting started

VM 🖥️

The development environment is Ubuntu Linux VM and can be extended to other environments. The script is written in Python 3

Download the files
Follow the installation steps:

sudo apt-get update
sudo apt-get install -y python3-pip
pip install psutil

We used the C programming language to build the malicious program, and GCC for the compilation

sudo apt-get install gcc
gcc memEat.c -o memEat # compile the memory malware
gcc bomb.c -o bomb # compile the fork bomb

run

python main.py # run the antivirus

open another terminal and run the malware you want to experiment with

Docker 🐋

docker build . -t malware-test # build the image
docker run -it malawre-test # run in interactive mode

the system monitor will appear, then in another terminal

docker exec -it <container-name> bash

from there you can run the malware and interact with the detector and experiment

Sample Output – VM

Folder Structure

Refer to the following table for information about important directories and files in this repository.

Malware-Detector-Repeat
├── screenshots         sample output
├── README.md           main documentation.
├── SysMonitor.py       reads and shows system info
├── Scan.py             the scanner: scan thes system for vulnerabilities
└── main.py             driver code

transcription-service

The guardian transcription service provides a simple user interface for guardian staff members to upload audio/video
files they need a transcript for. It then runs the transcription and notifies the user when it is complete. Transcripts
can then be exported to google drive.

Technically, the tool is a bunch of infrastructure and UI that wraps around whisperX
and whisper.cpp. We’re very grateful to @ggerganov and @m-bain for their work
on these projects, which provide the core functionality of the transcription service.

For guardian staff – the runbook is here.

Get started

We use localstack to run SQS locally rather than needing to create ‘dev’ queues in AWS. This is set up via docker.

Get Janus creds (for fetching creds from AWS Parameter Store)
Use the scripts/setup.sh script to install dependencies, set up the nginx mapping and create a docker based sqs queue

nvm use
scripts/setup.sh

Run the express backend API:

npm run api::start

Run the Next.js frontend:

npm run client::start

If all goes well the frontend is available at https://transcribe.local.dev-gutools.co.uk and the backend is available at https://api.transcribe.local.dev-gutools.co.uk

Running gpu worker (whisperX) locally

Running the gpu/whisperx worker needs whisperx and associated dependencies to be available. If you have already run
setup.sh then the environment should be setup, and you can run npm run gpu-worker::start to activate the python
environment and run the worker. We use pipenv to manage the python environment

The same python environment can be used to test changes to the model download python script.

NOTE: To get the API to actually send messages to the gpu queue, you’ll need to update the useWhisperX property in
config.ts – either by hard coding it or modifying the value in parameter store.

Emulating a production deployment

Occasionally you will want to develop something which relies on the specific ways we deploy into production.

When in development we run two web servers, the client nextjs dev server has features like autoreloading on changes and it proxies to the api express server.

In production we only run an express server which serves the client bundle whenever you hit a non-API endpoint. This is so that the clientside can handle routing for non-api endpoints.

If you are writing something that depends specifically on interactions between the API sever and the frontend you may want to check it works in production. First you need to update the config value of rootUrl to https://api.transcribe.local.dev-gutools.co.uk and then run npm run emulate-prod-locally. This will trigger a build and have your express web server provide the frontend bundle, rather than the nextjs server.

Then you can test the app using https://api.transcribe.local.dev-gutools.co.uk

Purging local queue

If you change the structure of messages on the queue you’ll probably want to purge all local messages. There’s a script
for that!

./scripts/purge-local-queue.sh

Whisper engine

This project currently makes use of both https://github.com/m-bain/whisperX and https://github.com/ggerganov/whisper.cpp
WhisperX needs to run on a GPU instance with Nvidia Cuda drivers and a mountain of python dependencies installed. To improve
transcript performance, these are baked into the AMI used for the transcription workers – see these prs for further details:

Currently we are trialling whisperx, with the hope of improved performance and speaker diarization support. There is an
intention, assuming whisperx has satisfactory performance, cost and transcript quality, to remove whisper.cpp,
thereby significantly simplifying our current infrastructure and the worker app.